Medical data processing method, related apparatus, and storage medium
By acquiring multimodal data, predicting an individual's future disease risk value, and developing personalized health screening strategies, this addresses the problem of existing screening strategies lacking individual variability, and achieves more accurate screening strategies.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING BLUE SATELLITE COMM TECH
- Filing Date
- 2026-04-01
- Publication Date
- 2026-06-26
Smart Images

Figure CN122291019A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of medical data processing, and more specifically to a medical data processing method, related apparatus, and storage medium. Background Technology
[0002] When using existing technologies to screen users for diseases, the screening strategies are often the same for different individuals, and users' medical data is not being fully utilized. Summary of the Invention
[0003] This application provides a medical data processing method, related apparatus, and storage medium, which can accurately formulate health screening strategies suitable for each individual based on their multimodal medical data, thereby making full use of each individual's medical data.
[0004] In a first aspect, embodiments of this application provide a medical data processing method, the method comprising: Acquire electronic health records, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data of target users within a preset time window; Based on the target user's electronic health record, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data, the target user's temporal characteristics, static baseline characteristics, and interaction characteristics are determined. The temporal characteristics represent the evolution of various health indicators and environmental indicators of the target user within a preset time window. The static baseline characteristics represent the target user's fixed basic health attribute indicators. The interaction characteristics represent the correlation between any two of the target user's various health indicators, various environmental indicators, and basic health attribute indicators. Based on the temporal characteristics, interaction characteristics, and static baseline characteristics of the target user, predict the risk value of the target user for the occurrence of the target disease in multiple future time periods; Based on the risk values of the target user for the occurrence of the target disease in multiple future time periods, determine the risk gradient values of the target user for the target disease in each future time period; Based on the risk values and risk gradient values of the target disease in the future multiple time periods, a health screening strategy for the target disease is determined.
[0005] Secondly, embodiments of this application provide a medical data processing apparatus having functions corresponding to the medical data processing method provided in the first aspect above. These functions can be implemented by hardware or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, and these modules can be software and / or hardware.
[0006] In one embodiment, the medical data processing device includes: The input / output module is configured to acquire electronic health records, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data of target users within a preset time window; The processing module is configured to determine the target user's temporal characteristics, static baseline characteristics, and interaction characteristics based on the target user's electronic health record, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data. The temporal characteristics represent the evolution patterns of various health indicators and environmental indicators of the target user within a preset time window; the static baseline characteristics represent the target user's fixed basic health attribute indicators; and the interaction characteristics represent the correlation between any two of the target user's various health indicators, various environmental indicators, and basic health attribute indicators. Based on the temporal characteristics, interaction characteristics, and static baseline characteristics of the target user, predict the risk value of the target user for the occurrence of the target disease in multiple future time periods; Based on the risk values of the target user for the occurrence of the target disease in multiple future time periods, determine the risk gradient values of the target user for the target disease in each future time period; Based on the risk values and risk gradient values of the target disease in the future multiple time periods, a health screening strategy for the target disease is determined.
[0007] Thirdly, embodiments of this application provide a computer-readable storage medium including instructions that, when executed on a computer, cause the computer to perform the medical data processing method as described in the first aspect.
[0008] Fourthly, embodiments of this application provide a computing device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the medical data processing method described in the first aspect.
[0009] Fifthly, embodiments of this application provide a computer program product containing instructions that, when run on a computer, cause the computer to execute the medical data processing method provided in the first aspect.
[0010] Compared to existing technologies, in this application embodiment, by acquiring multimodal data about the target user's physiology and environment, the risk value of the target user's target disease in multiple future time periods is predicted using the multimodal data. Based on the risk value of the target disease in each future time period, the risk gradient value corresponding to the target disease is determined. Then, based on the target user's risk value and risk gradient value for each target disease, a targeted health screening strategy can be formulated, which fully considers individual differences and the formulated health screening strategy is more in line with the actual situation of each individual. Attached Figure Description
[0011] The objectives, features, and advantages of the embodiments of this application will become readily understood by referring to the accompanying drawings and the detailed description of the embodiments. Wherein: Figure 1 This is a flowchart illustrating a medical data processing method according to an embodiment of this application; Figure 2 This is a schematic diagram of the structure of a medical data processing device according to an embodiment of this application; Figure 3 This is a schematic diagram of the structure of a computing device according to an embodiment of this application; Figure 4 This is a schematic diagram of the structure of a terminal device according to an embodiment of this application; Figure 5 This is a schematic diagram of a server structure in one embodiment of this application.
[0012] In the accompanying drawings, the same or corresponding reference numerals indicate the same or corresponding parts. Detailed Implementation
[0013] The terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in a sequence other than that illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or modules is not necessarily limited to those explicitly listed, but may include other steps or modules not explicitly listed or inherent to these processes, methods, products, or devices. The division of modules in the embodiments of this application is merely a logical division; in actual applications, there may be other division methods. For example, multiple modules may be combined or integrated into another system, or some features may be omitted or not performed. Additionally, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interface, indirect coupling between modules, or electrical or other similar forms of communication connection, none of which are limited in the embodiments of this application. Furthermore, the modules or sub-modules described as separate components may or may not be physically separated, may or may not be physical modules, or may be distributed among multiple circuit modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the embodiments of this application.
[0014] The solutions provided in this application involve technologies such as Artificial Intelligence (AI), Computer Vision (CV), and Machine Learning (ML), and are specifically illustrated through the following embodiments: AI, or Artificial Intelligence, refers to the theories, methods, technologies, and application systems that utilize digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, Artificial Intelligence is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine capable of reacting in a manner similar to human intelligence. Artificial Intelligence studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.
[0015] AI technology is a comprehensive discipline encompassing a wide range of fields, including both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies primarily include computer vision, speech processing, natural language processing, and machine learning / deep learning.
[0016] Computer vision (CV) is the science that studies how to enable machines to "see." More specifically, it refers to machine vision, which uses cameras and computers to replace human eyes for target recognition, tracking, and measurement, and then performs image processing to create images more suitable for human observation or transmission to instruments. As a scientific discipline, computer vision studies related theories and technologies, attempting to build artificial intelligence systems capable of extracting information from images or multidimensional data. Computer vision technologies typically include adversarial perturbation generation, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content / behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous localization and mapping (SLAM), and common biometric recognition technologies such as facial recognition and fingerprint recognition.
[0017] Current technologies often employ the same screening strategies for all individuals when screening for diseases. For example, for the same disease, all individuals undergo health screening every two years; or, for the same disease, all individuals begin screening at a fixed age. This screening method lacks consideration of individual differences and cannot tailor the most accurate screening strategy based on those differences.
[0018] Compared to existing technologies, in this application embodiment, by acquiring multimodal data about the target user's physiology and environment, the risk value of the target user's target disease in multiple future time periods is predicted using the multimodal data. Based on the risk value of the target disease in each future time period, the risk gradient value corresponding to the target disease is determined. Then, based on the target user's risk value and risk gradient value for the target disease, a targeted health screening strategy can be formulated, which fully considers individual differences and the formulated health screening strategy is more in line with the actual situation of each individual.
[0019] It should be noted that the computing devices involved in the embodiments of this application may be servers and / or terminal devices.
[0020] The server involved in the embodiments of this application can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
[0021] The terminal devices involved in the embodiments of this application can be devices that provide voice and / or data connectivity to users, handheld devices with wireless connectivity, or other processing devices connected to a wireless modem. Examples include mobile phones (or "cellular" phones) and computers with mobile terminals, such as portable, pocket-sized, handheld, computer-embedded, or vehicle-mounted mobile devices that exchange voice and / or data with a wireless access network. Examples include Personal Communication Service (PCS) phones, cordless phones, Session Initiation Protocol (SIP) phones, Wireless Local Loop (WLL) stations, Personal Digital Assistants (PDAs), and other devices.
[0022] Reference Figure 1 , Figure 1 This is a flowchart illustrating a medical data processing method provided in an embodiment of this application. The method can be executed by a medical data processing device and can be applied to disease screening scenarios. It can output a health screening strategy for a specific disease based on the multimodal data of a target user. The method includes steps S100-S500: S100: Obtain electronic health records, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data of target users within a preset time window.
[0023] In this application embodiment, the preset time window can be preset, such as the target user's historical data for 1 year, 2 years or 3 years. The length of the preset time window can be adjusted adaptively. This application embodiment does not limit the length of the preset time window.
[0024] In this embodiment of the application, the electronic health records of the target user can be interconnected with the hospital's HIS system, LIS system, and physical examination center database through the Health and Medical Information Exchange Standard (HL7 FHIR) to achieve data interconnection.
[0025] In this embodiment of the application, the target user's electronic health record includes: Past medical history, including the following health indicators: history of hypertension, history of diabetes; Medical examination report; The laboratory results include the following health indicators: tumor marker test results, blood lipid test results, and liver and kidney function; Medication records, including the following health indicators: antihypertensive drug usage records and hypoglycemic drug usage records; Surgical history; Hospitalization records.
[0026] In this embodiment of the application, the target user's history of hypertension also includes grading information, such as hypertension grade 1 / 2 / 3; complication information, such as hypertensive nephropathy; medication adherence related records, etc.
[0027] In the embodiments of this application, the history of diabetes also includes: diabetes type information, such as type 1 and type 2; complication information, such as diabetic retinopathy and diabetic foot; and blood glucose control information, such as historical data of glycated hemoglobin.
[0028] In this embodiment of the application, the blood routine test in the physical examination report includes items such as red blood cell count, white blood cell count and differential, platelet count, hemoglobin, and serum vitamin D, and indicates the reference range and whether the results are abnormal for each item. In this embodiment of the application, the liver function test in the physical examination report includes indicators such as ALT (alanine aminotransferase), AST (aspartate aminotransferase), GGT (gamma-glutamyl transferase), ALP (alkaline phosphatase), total bilirubin, direct bilirubin, indirect bilirubin, total protein, albumin, and globulin, as well as the reference range and abnormal conditions for each indicator; In this embodiment of the application, the renal function in the physical examination report includes indicators such as creatinine (Cr), blood urea nitrogen (BUN), uric acid (UA), estimated glomerular filtration rate (eGFR), as well as the reference range and abnormal conditions of each indicator.
[0029] In this embodiment, tumor markers may include the detection values and reference ranges of tumor markers such as AFP (alpha-fetoprotein), CEA (carcinoembryonic antigen), CA125, CA19-9, CA125, CA15-3, CYFRA21-1, SCC, and PSA (prostate-specific antigen, for men), as well as the dynamic monitoring trends of each tumor marker (such as changes in multiple tests). In this embodiment, blood lipids in the test results may include specific values of total cholesterol, triglycerides, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, apolipoprotein A1, and apolipoprotein B, along with reference ranges, abnormalities, and the duration of abnormal blood lipids or intervention measures. In this embodiment, the antihypertensive drug usage record in the medication record may include the specific name of the antihypertensive drug (such as amlodipine, valsartan, etc.), dosage (such as 5mg / tablet), frequency of administration (such as once daily), duration of administration, and whether there are any adverse reaction records.
[0030] In this embodiment of the application, the antihypertensive drug may include: Calcium channel blockers (CCBs), such as: amlodipine, levamlodipine, nifedipine (sustained-release / controlled-release tablets), felodipine, lacidipine; Angiotensin-converting enzyme inhibitors (ACEIs), such as enalapril, benazepril, lisinopril, ramipril, and perindopril; Angiotensin II receptor antagonists (ARBs), such as losartan, valsartan, irbesartan, telmisartan, olmesartan, and desartan; Beta-blockers, such as: metoprolol (extended-release / plain tablets), bisoprolol, atenolol, labetalol. Diuretics, such as hydrochlorothiazide, indapamide, spironolactone, and furosemide; Combination antihypertensive drugs, such as irbesartan hydrochlorothiazide, valsartan amlodipine, and losartan hydrochlorothiazide.
[0031] In the embodiments of this application, the medication record may include supplementary records of the specific name of the hypoglycemic drug (such as metformin, insulin type, etc.), dosage, injection / administration frequency, duration of medication, and records of the correlation between blood glucose monitoring and drug efficacy.
[0032] In this application embodiment, the hypoglycemic drug may include: Metformin-type hypoglycemic drugs, such as metformin (regular / extended-release / enteric-coated tablets). Sulfonylurea hypoglycemic drugs, such as: gliclazide, glimepiride, glipizide, and glimepiride; Dipeptidyl peptidase-4 inhibitors (DPP-4), such as sitagliptin, saxagliptin, linagliptin, and alogliptin; Sodium-glucose cotransporter 2 inhibitors (SGLT-2), such as dapagliflozin, empagliflozin, and canagliflozin; Glucagon-like peptide-1 receptor agonists (GLP-1), such as liraglutide, smegglutide, dulaglutide, and benaglutide; Thiazolidinediones, such as pioglitazone and rosiglitazone; Alpha-glucosidase inhibitors, such as acarbose, voglibose, and miglitol; Insulin-based hypoglycemic agents, such as insulin aspart, insulin lispro, insulin glargine, insulin detemir, and insulin degludec; Premixed insulin-like hypoglycemic drugs, such as 30R and 50R. In the embodiments of this application, the surgical history may include the specific name of the surgery (such as cholecystectomy, knee replacement surgery), surgical site, type of surgery (minimally invasive / open surgery), reason for surgery (such as gallstones, joint degeneration), postoperative recovery status, etc.
[0033] In this embodiment of the application, the hospitalization record may include the reason for hospitalization (such as acute myocardial infarction, pneumonia), the hospitalization department (cardiology, respiratory medicine, etc.), discharge diagnosis, key treatment measures during hospitalization (such as surgery, special medications), etc.
[0034] In this embodiment of the application, the genomic data of the target user can be obtained by connecting to the database of the gene testing institution and obtaining the raw data of the gene testing report after the target user authorizes it.
[0035] In this embodiment of the application, genomic data may include: Oncology-related genes: mutation detection of genes such as BRCA1 / 2, breast cancer, lung cancer (EGFR), colorectal cancer (KRAS), and TP53 (multiple cancers), used for early cancer screening and targeted therapy guidance.
[0036] Cardiovascular disease-related genes: LDLR, familial hypercholesterolemia, hypertension (ACE), hyperlipidemia (APOB), hypertrophic cardiomyopathy (MYBPC3), etc., are used to assess the risk of coronary heart disease, arrhythmia, etc.
[0037] Genes associated with neurodegenerative diseases include: APOEε4, Alzheimer's disease, early-onset Alzheimer's disease (PSEN1), Parkinson's disease (SNCA), and ALS (C9orf72).
[0038] Genes associated with rare genetic diseases: Cystic fibrosis (CFTR), spinal muscular atrophy (SMN1) gene, etc.
[0039] In this application embodiment, the lifestyle and behavioral data of the target user may include: Dietary habit data, which includes the following health indicators: fat intake, carbohydrate intake, and vegetable and fruit intake; Smoking data, including the following health indicators: years of smoking, daily amount of cigarettes smoked. The drinking data includes the following health indicators: years of drinking and daily alcohol consumption. Exercise data, which includes the following health indicators: exercise frequency, exercise intensity, and exercise duration; Sleep data, which includes the following health indicators: daily sleep duration, daily sleep quality, and daily bedtime; Physiological data, including the following health indicators: resting heart rate, heart rate variability (HRV), blood pressure, blood glucose, and weight.
[0040] In this embodiment of the application, the lifestyle and behavioral data of the target user can be obtained in the following ways: Method 1. Periodic Intelligent Questionnaire: A structured online questionnaire can be pushed to target users at fixed intervals, and users can fill it out online. Method 2. Target users can record data in real time through wearable devices, such as smartwatches, wristbands, blood glucose meters, body fat scales, etc.
[0041] In this embodiment of the application, exercise intensity may include high-intensity exercise, medium-intensity exercise, and low-intensity exercise, as detailed below: The metabolic equivalent (MET) of low-intensity exercise is between 1.5 and 2.9 METs. 1 MET is the energy expenditure during quiet seated rest, approximately 3.5 ml of oxygen per kilogram of body weight per minute. The target user's heart rate during low-intensity exercise is 50% to 60% of their maximum heart rate, which is the difference between a maximum heart rate of 220 and the target user's age. The target user's subjective feeling during low-intensity exercise is a slight increase in breathing and heart rate, a feeling of ease, and the ability to converse normally or even sing. For example, slow walking (speed < 4 km / h), doing housework (such as washing dishes or making the bed), and light standing activities can be classified as low-intensity exercise.
[0042] The metabolic equivalent (MET) of moderate-intensity exercise is between 3.0 and 5.9 MET. The target user's maximum heart rate during moderate-intensity exercise is 60% to 85% of their maximum heart rate. The target user's subjective experience during moderate-intensity exercise includes slightly rapid breathing, slight sweating, and mild exertion; they can speak normally but cannot sing. Examples include brisk walking (4-6 km / h), jogging, cycling (moderate speed), Tai Chi, and doubles tennis.
[0043] High-intensity exercise (HIIT) is defined as having a metabolic equivalent (MET) greater than or equal to 6.0 MET. Target users experience a percentage of their maximum heart rate of 85% or higher during HIIT. Subjective symptoms during HIIT include significantly rapid breathing, profuse sweating, and difficulty speaking due to shortness of breath. Examples include fast running (speed > 8 km / h), high-intensity interval training (HIIT), fast cycling, tennis singles, and lifting heavy objects.
[0044] In this embodiment of the application, the environmental exposure data includes the following environmental indicators: PM2.5 concentration, solar radiation intensity, temperature, humidity, and regional infectious disease epidemic information.
[0045] In this embodiment of the application, the GPS location information of the target user's mobile phone can be obtained through the authorization of the target user. By connecting with a third-party environmental exposure database, such as the database of the National Environmental Monitoring Center or the meteorological department, the real-time environmental indicators and historical environmental indicators related to the location of the target user can be obtained based on the location information.
[0046] S200: Based on the target user's electronic health record, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data, determine the target user's temporal characteristics, interaction characteristics, and static baseline characteristics; wherein, the temporal characteristics represent the evolution pattern of various health indicators of the target user within a preset time window, the static baseline characteristics represent the target user's fixed basic health attributes, and the interaction characteristics represent the correlation between the target user's various health indicators and various environmental indicators of the environment.
[0047] In this embodiment of the application, after obtaining the target user's electronic health record, genomic data, lifestyle and behavior data, and environmental exposure data, each data can be cleaned and standardized.
[0048] In this embodiment of the application, when cleaning various data, the ETL (Extract-Transform-Load) tool can be used to clean the data to remove missing values and outliers from the data, taking into account the format differences of different data sources.
[0049] In this embodiment of the application, when standardizing various data, for numerical data, standardization algorithms (such as Z-score standardization and Min-Max standardization) can be used to transform data of different dimensions into a unified standard, ensuring that the data are comparable and fusionable. For non-numerical data (such as air quality, hospital records, past medical history, etc.), standardization can be performed using methods for categorical data and text data, as follows: For categorical data For categorized non-numerical data, it can be converted into numerical form, such as binary or ordered numbers. For example: One-hot encoding is suitable for unordered categorical variables, such as air quality levels, hospital departments, and disease types. It transforms each category into an independent binary feature, such as 0 or 1. For example, air quality levels include "Excellent," "Good," "Lightly Polluted," "Moderately Polluted," and "Heavily Polluted"; this is converted into five binary sequences, such as "Excellent = 1, 0, 0, 0, 0"; "Good = 0, 1, 0, 0, 0," etc.
[0050] Label encoding is suitable for ordinal categorical variables, such as disease severity and hospital stay duration levels. It maps each category to an ordered numerical value (e.g., 1, 2, 3...). For example, hypertension is classified as "Grade 1", "Grade 2", and "Grade 3", which can be mapped to 1, 2, and 3 respectively (the higher the grade, the larger the numerical value). Similarly, hospital stay duration can include "<3 days", "3-7 days", and ">7 days", which can be mapped to 1, 2, and 3 for each type of hospital stay.
[0051] Binary encoding is suitable for situations with a large number of categories, such as the various subcategories of regional infectious diseases. Each subcategory can be encoded as an integer first, and then converted into binary bits.
[0052] For text data In this embodiment of the application, the standardization of text data can be achieved by extracting and classifying keywords and mapping standardized terms, thereby converting it into an analyzable format.
[0053] Keyword extraction and classification For example, for hospitalization record data such as "admitted due to acute myocardial infarction, with a history of hypertension for 5 years, no diabetes...", keywords are extracted: reason for hospitalization "acute myocardial infarction", medical history "hypertension", and duration of medical history "5 years". The content corresponding to each keyword is encoded into the corresponding category or value.
[0054] For standardized terminology mapping Colloquial or diverse descriptions can be standardized into medical terminology (based on medical terminology databases such as ICD-10 and SNOMED CT). For example, for medical history data such as "has hypertension and takes antihypertensive drugs", its standardized terminology can be mapped to "history of hypertension (existing)" and "use of antihypertensive drugs (yes)", and then coded as 0 or 1 respectively.
[0055] In this embodiment of the application, after cleaning and standardizing the multi-source data of the target user, a precise timestamp can be added to each cleaned and standardized data. The timestamp can be accurate to the minute or hour, and the target user ID is used as its unique identifier to form a three-dimensional time series data of "user ID, time, and data". After adding a timestamp and user ID to each data, a three-dimensional time series database can be obtained. The three-dimensional time series database includes the multi-source data of the target user within a preset time window.
[0056] In this embodiment of the application, after obtaining the three-dimensional time series database, in step S200, the temporal features and static baseline features of the target user under a preset time window can be extracted based on the three-dimensional time series database.
[0057] In this embodiment of the application, the time sequence feature represents the evolution pattern of various health indicators and environmental indicators of the target user's environment within a preset time window.
[0058] The time-series characteristics of each health indicator and each environmental indicator in the embodiments of this application are as follows: The time-series characteristic of the blood pressure index is: time-series blood pressure value; The time-series characteristic of the blood glucose index is: time-series blood glucose value; The temporal characteristics of the static heart rate index are: temporal static heart rate; The temporal characteristics of the heart rate variability index are: temporal heart rate variability; The time-series characteristics of the liver and kidney function indicators are: time-series red blood cell count, time-series white blood cell count, and time-series hemoglobin cell count; The time-series characteristics of the blood lipid indicators are: time-series transaminase levels and time-series creatinine levels; The time-series characteristics of the tumor marker indicators are: time-series tumor marker levels (such as AFP, CEA); The time-series characteristics of the blood lipid index are: time-series blood lipid values; The temporal characteristics of the exercise duration index are: the duration of each exercise session; The temporal characteristic of the motion frequency index is: the number of motion cycles in a time series; The temporal characteristics of the exercise intensity index are: the energy consumption value of each exercise session over time; The temporal characteristics of the sleep time index are: daily sleep duration. The temporal characteristics of the sleep onset time index are: daily sleep onset time; The temporal characteristics of the sleep quality index are: temporal daily sleep quality score; The time-series characteristics of the PM2.5 concentration index are: time-series PM2.5 concentration; The temporal characteristics of the solar radiation intensity index are: temporal solar radiation intensity values; The time-series characteristic of the temperature index is: time-series temperature value; The temporal characteristic of the humidity index is: temporal humidity value; The time-series characteristics of the smoking years index and the daily smoking amount index are respectively: time-series daily smoking years and time-series daily smoking amount; The time-series characteristics of the drinking duration index and the daily drinking volume index are respectively: time-series daily drinking duration and time-series daily drinking volume.
[0059] In the application embodiment, the time-series features corresponding to various health indicators and environmental indicators of the target user can be directly extracted based on the corresponding health indicator data and environmental indicator data.
[0060] In this embodiment of the application, the static baseline characteristics of the target user represent the fixed basic health attributes of the target user, and the static baseline characteristics do not change over time.
[0061] In this embodiment of the application, the static baseline characteristics of the target user include: pathogenic mutation gene information, family medical history, blood type, and gender.
[0062] In this application embodiment, the pathogenic mutated gene may include genes such as BRCA1 / 2 (breast cancer / ovarian cancer), MLH1 / MSH2 (Lynch syndrome / colorectal cancer), TP53, and KRAS. In this application embodiment, the pathogenic mutated gene information represents whether the target user carries the pathogenic mutated gene, and it can be determined whether the user carries the pathogenic mutated gene through the following steps: First, raw gene sequencing data can be obtained based on the genomic data of the target users, and the raw sequencing data can be parsed in VCF (Variant Call Format) format. Then, variant detection and comparison are performed based on the parsed raw gene sequencing data. For example, bioinformatics tools (such as GATK and BWA) are used to compare the target user's sequencing sequence with the human reference genome (such as GRCh38) to identify all variant sites that are different from the reference genome (including single nucleotide variants, insertions and deletions).
[0063] Then, variant annotation is performed. For example, functional annotation is performed on the identified variant sites to clarify their location in the genome (such as the xth exon of the BRCA1 gene), variant type (such as missense mutation, nonsense mutation), and associated with the corresponding gene functional regions (such as coding regions, regulatory regions).
[0064] Finally, pathogenicity assessment is performed. For example, authoritative databases (such as ClinVar, HGMD, and BRCAExchange) are used to determine whether the mutation is pathogenic. For instance, if the mutation site is marked as "pathogenic" or "possibly pathogenic" by ClinVar, and there is literature or clinical evidence supporting its association with breast cancer risk, it is determined to "carry the BRCA1 pathogenic mutation gene"; if it is a "benign" or "undetermined" mutation, it is determined to "not carry the pathogenic mutation gene". In the embodiments of this application, the non-genetic static baseline characteristics of the target user, such as family medical history, blood type, and gender, can be obtained through structured encoding.
[0065] For example, in this embodiment of the application, a family medical history questionnaire filled out by the target user can be collected. The questionnaire content includes whether a first-degree relative has had breast cancer, etc. Then, the text description of the target user's questionnaire is converted into binary features, such as: yes=1, no=0, blood type O=00, blood type A=01, blood type B=10, blood type AB=11, etc. In addition, if multiple types of diseases are involved, such as simultaneously inquiring about family medical history of breast cancer and ovarian cancer, the text descriptions corresponding to breast cancer and ovarian cancer can be divided into multiple different independent features.
[0066] Static baseline characteristics of the target user, such as blood type and gender, can be obtained directly from the target user's electronic health record.
[0067] After determining the temporal and static baseline characteristics of the target user, the interaction characteristics can be determined based on the temporal and static baseline characteristics.
[0068] In this embodiment of the application, the interaction features can be directly extracted based on domain knowledge and business logic. The purpose is to make the correlation between multiple basic health indicators or environmental indicators explicit and capture the synergistic effect of indicator combinations on health risks.
[0069] In this application embodiment, the interaction features can be implemented from the following five aspects: Aspect 1: Physiological-motor interaction In aspect one, there are three interactive features: decreased exercise frequency and increased weight, insufficient exercise intensity and high blood pressure, and insufficient exercise duration and large fluctuations in blood sugar.
[0070] Specifically, the phenomenon of decreased exercise frequency and increased weight can be determined based on the trend of the target user's exercise frequency index and weight index. For example, the monthly change rate of the target user's exercise frequency and the monthly change rate of weight can be calculated separately. A positive slope indicates an increase, and a negative slope indicates a decrease. This allows us to determine whether the target user's interaction characteristics include "decreased exercise frequency and increased weight".
[0071] For cases of insufficient exercise intensity and high blood pressure, the assessment can be based on the target user's exercise intensity and blood pressure indicators. For example, if the target user's exercise intensity is below moderate intensity for more than a preset time, and the average blood pressure is higher than a preset threshold (such as 130 / 85 mmHg), then it is marked as insufficient exercise intensity and high blood pressure.
[0072] For cases of insufficient exercise duration and large blood glucose fluctuations, the determination can be based on the target user's exercise duration and blood glucose levels. For example, if the target user's weekly average exercise duration is lower than a preset threshold (e.g., 150 minutes) and the standard deviation of blood glucose is greater than a preset threshold (e.g., 2.0 mmol / L), then it is marked as insufficient exercise duration and large blood glucose fluctuations.
[0073] Part Two: Exercise-Diet Interaction In aspect two, the following two interactive features are included: increased exercise frequency without a decrease in carbohydrate intake; and a high-protein diet with low exercise intensity.
[0074] For an increase in exercise frequency without a decrease in carbohydrate intake, this can be determined using exercise frequency and carbohydrate intake metrics. For example, calculate the slope of the target user's exercise frequency and the average carbohydrate intake. If the slope of the target user's exercise frequency is positive and the average carbohydrate intake has not decreased, then it is marked as an increase in exercise frequency without a decrease in carbohydrate intake.
[0075] For a high-protein diet with low exercise intensity, this can be determined using protein intake and exercise intensity indicators. For example, if protein intake exceeds a preset threshold (e.g., 1.2g / kg body weight) and the duration of low-intensity exercise exceeds a preset threshold (e.g., one week), it is labeled as a high-protein diet with low exercise intensity.
[0076] Part Three: Sleep-Physiological Interaction In aspect three, the following two interactive features are included: insufficient sleep duration and low heart rate variability; delayed sleep onset and high blood sugar.
[0077] For insufficient sleep duration and low heart rate variability, this can be determined using sleep duration and heart rate variability metrics. For example, if the target user's average sleep duration is below a preset threshold (e.g., 7 hours), and the standard deviation of the user's normal heart rate intervals (SDNN) is below a preset threshold (e.g., 40ms), then the user is labeled as having insufficient sleep duration and low heart rate variability.
[0078] For cases of delayed sleep onset and elevated blood sugar, the determination can be made using the target user's sleep onset time and blood sugar levels. For example, if the target user's sleep onset time exceeds a preset time (e.g., 11:00 PM) and their average blood sugar level is higher than a preset threshold (e.g., 6.1 mmol / L), then it is marked as delayed sleep onset and elevated blood sugar.
[0079] Aspect 4: Environment-Health Interaction Aspect 4 includes the following two interactive features: increased PM2.5 concentration and decreased lung function; insufficient sunlight intensity and vitamin D deficiency.
[0080] Elevated PM2.5 concentration and decreased lung function can be determined using PM2.5 concentration indicators and lung function indicators. For example, the slope of the PM2.5 concentration curve for the target user's environment and the slope of the target user's lung function indicator (such as FEV1) curve can be calculated. If the slope corresponding to PM2.5 concentration is positive and the slope corresponding to FEV1 is negative, then it is marked as elevated PM2.5 concentration and decreased lung function.
[0081] For cases of insufficient sunlight intensity and vitamin D deficiency, the deficiency can be determined using sunlight exposure index and serum vitamin D concentration index. For example, if the fixed average value before sunlight exposure is lower than a preset threshold (e.g., 2 hours / day) and the serum vitamin D concentration is lower than a preset threshold (e.g., 20 ng / ml), then it is marked as insufficient sunlight intensity and vitamin D deficiency.
[0082] Aspect 5: Interaction between lifestyle habits and family medical history In aspect five, the following two interactive features are included: long smoking history and positive family history of lung cancer; high alcohol consumption and positive history of hypertension.
[0083] For individuals with a long history of smoking and a positive family history of lung cancer, this can be determined using both smoking duration and family history of lung cancer indicators. For example, if a target user's smoking duration exceeds a preset threshold (e.g., 20 years) and they have a first-degree relative with lung cancer, they are marked as having a long history of smoking and a positive family history of lung cancer.
[0084] For individuals with high alcohol consumption and a positive history of hypertension, the criteria can be determined based on their alcohol consumption and hypertension history. For example, if a target user's average daily alcohol consumption exceeds a preset threshold (e.g., 25 grams for men and 15 grams for women) and they have a history of hypertension, they are labeled as having high alcohol consumption and a positive history of hypertension.
[0085] As can be seen from aspects one through five above, the interaction features of the target user represent the correlation between any two indicators among the target user's various health indicators and various environmental indicators. However, for each health indicator and each environmental indicator, the corresponding time-series features are mostly multi-step, high-frequency time-series data, which are difficult to directly reflect the correlation between any two indicators. Therefore, the time-series features can be aggregated first, as follows: During aggregation, a preset aggregation window can be set, such as an aggregation window of six months. For any time series feature, the time series feature values corresponding to each time step within six months can be aggregated into a one-dimensional feature, such as mean, variance, and standard deviation.
[0086] For example, regarding the temporal features of blood pressure, assuming blood pressure is collected once a day, the time step of the corresponding temporal feature is one day. This can be aggregated into a 6-month window. The average daily blood pressure value within those 6 months is calculated, and this average daily blood pressure value is used as the aggregated feature for that 6-month period. Therefore, by aggregating the temporal features of blood pressure across multiple time steps, a relatively smaller time step temporal aggregation window feature can be obtained. This temporal aggregation window feature not only reflects the user's blood pressure change trend over time but also reduces the amount of data.
[0087] Additionally, it should be noted that the preset aggregation window length is the same for different types of time-series features. For example, all are aggregated using a 6-month aggregation window.
[0088] After aggregation, based on the aggregated time-series features, the interaction features of the target user under each preset aggregation window can be confirmed according to the interaction features in aspects one through five above. For example, within a certain six-month period, the target user's exercise frequency increased while carbohydrate intake did not decrease, or within a certain six-month period, the PM2.5 concentration in the user's environment increased while the target user's lung function decreased.
[0089] S300: Based on the temporal characteristics, interaction characteristics, and static baseline characteristics of the target user, predict the risk value of the target user's occurrence of the target disease in multiple future time periods.
[0090] In this embodiment of the application, the risk value of the target user's occurrence of the target disease in multiple future time periods can be predicted based on the following steps S310-320: S310: Based on the temporal features, the static baseline features, and the interaction features, a fusion feature matrix is obtained by concatenating them.
[0091] In this embodiment of the application, the fused feature matrix can be obtained by concatenating the following steps S311-314: S311: Based on each of the time-series features, determine the time-series aggregation window feature matrix; wherein, the time-series aggregation window feature matrix includes the time-series aggregation features corresponding to each of the time-series features.
[0092] The method for obtaining time-series aggregation features can refer to step 200 above, and will not be elaborated here.
[0093] In this embodiment, assuming there are N types of time-series features, and each time-series feature aggregates into T aggregation windows, then the time-series aggregation window features corresponding to each type of time-series feature can form a feature with dimension T. The temporal aggregation window feature matrix of N.
[0094] S312: Based on each of the static baseline features, determine the static baseline feature expansion matrix; wherein, the static baseline feature expansion matrix includes the static baseline expansion features corresponding to each of the static baseline features, and the dimension of each of the static baseline expansion features is the same as the dimension of each of the time-series window aggregation features.
[0095] In this embodiment, the static baseline features do not contain temporal information. Therefore, it is necessary to diffuse the static baseline features of each type according to the dimension of the temporal aggregation window features. The dimension of the static baseline extended features of each type is the same as the dimension of the temporal aggregation window features.
[0096] For example, if a time series aggregated window feature includes T aggregated windows, that is, the dimension T of the time series aggregated window feature corresponding to the same type of time series feature, then the dimension of each static baseline feature after expansion is T, and each element is the same static baseline feature.
[0097] Assuming there are M categories of static baseline features, then expanding these M categories of static baseline features will yield a result with dimension T. The static baseline feature extension matrix of M; S313: Determine the interaction feature matrix based on each of the aforementioned interaction features.
[0098] In step S200, the interaction features are obtained based on the features of each temporal aggregation window, meaning the dimension of the interaction features is also T. Assuming there are N types of interaction features, then K types of interaction features can form a feature of dimension T. The interaction feature matrix of K.
[0099] S314: Based on the temporal aggregation window feature matrix, the static baseline feature extension matrix, and the interactive feature matrix, a fused feature matrix is obtained.
[0100] Steps S311, 312, and 313 respectively yield the dimension T. The temporal aggregation window feature matrix of N, T M's static baseline feature extension matrix, T The interaction feature matrix of K is time-step aligned among its constituent matrices. Therefore, the matrices can be directly concatenated to obtain a matrix of dimension T. The fusion feature matrix of (M+N+K).
[0101] The fusion feature matrix includes not only single time-series features and static baseline features, but also interactive features. It can integrate dynamic information that changes over time and static information that remains unchanged into a unified feature space, which makes it easier for subsequent longitudinal risk trajectory prediction models to learn the correlation between dynamic information that changes over time and static information that remains unchanged.
[0102] S320: Based on the fusion feature matrix, predict the risk value of the target user for the occurrence of the target disease in multiple future time periods.
[0103] In the embodiments of this application, the risk value of the occurrence of the target disease of the target user in multiple future time periods can be predicted based on the longitudinal risk trajectory prediction model.
[0104] The longitudinal risk trajectory prediction model can include an encoder and an output layer. The encoder can adopt a Transformer architecture, or a GRU architecture or an LSTM architecture.
[0105] The Transformer architecture includes a self-attention layer, a multi-head attention mechanism layer, a feedforward neural network layer, and a normalization layer.
[0106] Among them, the self-attention mechanism can process time-series data in the fused feature matrix in parallel. By calculating the association weights between each time-step feature and all time-step features, it can capture the long-term temporal dependence of a single indicator. For example, the association between historical blood pressure changes over one year and future risk. At the same time, it can also capture the global interaction between different health indicators, such as the synergistic effect of heart rate, exercise, and weight within the same time window.
[0107] Multi-head attention uses multiple parallel attention heads to learn feature relationships from different perspectives. For example, one attention head focuses on the internal dependencies of physiological indicators, while another focuses on the interaction between physiological characteristics and lifestyle, thus improving the comprehensiveness of feature learning.
[0108] Feed-forward networks can perform non-linear transformations on the features output by self-attention mechanisms, further enhancing feature representation capabilities. Each layer includes operations such as linear transformation, ReLU activation, and dropout regularization.
[0109] Layer normalization can normalize the input of each layer, accelerating model training convergence and improving model stability.
[0110] GRU and LSTM architectures, through gating mechanisms, can selectively learn the long-term effects of interactive features.
[0111] In step S310, the fusion feature matrix is determined. In step S320, the fusion feature matrix can be input into the encoder. The encoder can obtain a high-order feature matrix that fuses temporal dependence and multi-dimensional interaction. This high-order feature matrix is then input into the output layer, and a fully connected layer maps the high-order feature matrix into the risk values of the target disease at multiple future time points. For example, the probability of developing hypertension in 6 months, 1 year, 2 years, 3 years, and 5 years.
[0112] In this embodiment of the application, the longitudinal risk trajectory prediction model is pre-trained through the following steps S330-S334: S330: Obtain training samples from multiple training users. Each training sample includes the user's electronic health record, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data. Each training sample is labeled with information on whether the training user has the target disease.
[0113] The training samples for each training user include the user's electronic health record, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data. The acquisition methods are the same as those in step S100 above, and will not be elaborated here.
[0114] In this embodiment, each training sample's label includes a target disease and the onset time of the target disease. Based on this type of training sample, the longitudinal risk trajectory prediction model trained can predict the target disease. Alternatively, multiple different types of training samples can be used to train the longitudinal risk trajectory prediction model. The labels of training samples of the same type correspond to the same target disease, while the labels of training samples of different types correspond to different target diseases. By using multiple different types of training samples to train the longitudinal risk trajectory prediction model, the trained longitudinal risk trajectory prediction model can predict each target disease corresponding to different types of training samples.
[0115] Additionally, it should be noted that there is an interval between the time period of the training samples and the time period corresponding to whether or not the target user has the target disease in the labels. For example, the multimodal data in the training samples are data from the target user's previous 10 to 6 years, while the labels are the target user's actual disease status information from the previous 5 years.
[0116] S331: Based on the training samples, predict the risk value of the target disease for the training user in various future time periods.
[0117] In this embodiment of the application, the multimodal data in the training sample is the data of the target user from the previous 10 to the previous 6 years. Based on the multimodal data from the previous 10 to the previous 6 years, the risk value of the occurrence of the target disease in the 6th month, 1st year, 2nd year, 3rd year, 4th year and 5th year starting from the previous 6 years is predicted respectively. The specific prediction process is referred to the above steps S100-S300, which will not be described in detail here.
[0118] S332: Based on the risk values of the target disease occurring in the training users at various future time periods, and the labels of the training samples, determine the cross-entropy loss of the prediction results for each time period.
[0119] Step S410 can determine the risk value of the target disease for the target user in the 6th month, 1st year, 2nd year, 3rd year, 4th year, and 5th year starting from the previous 6 years. By comparing it with the label, the loss between the predicted result and the actual label for each time period can be determined.
[0120] In addition, if the label contains only one target disease, then each time period corresponds to one prediction result for one type of target disease, and one loss; if the label contains multiple types of diseases, then each time period has multiple prediction results for multiple target diseases, and multiple losses.
[0121] S333: Based on the cross-entropy loss of the prediction results for each time period, determine the total loss of the longitudinal risk trajectory prediction model.
[0122] In this embodiment, when there is only one target disease in the label, the total loss of the model can be calculated based on the loss corresponding to the prediction results of that single target disease at each time period. If there are multiple target diseases, the total loss of the model needs to be calculated by combining the losses of the prediction results of each target disease at each loss period.
[0123] In this embodiment of the application, when only one target disease is included, the overall loss of the model can be calculated based on the following formula:
[0124] Where N represents the number of training samples. This represents the true label of the i-th training sample. Represents the risk value corresponding to the i-th training sample, and log(·) represents the natural logarithm. or The smaller the absolute value of L, the smaller the overall loss L.
[0125] In this embodiment of the application, when multiple target diseases are included, the overall loss of the model can be calculated based on the following formula:
[0126] Where N represents the number of training samples and M represents the number of target disease types. This represents the true label of the i-th training sample for the j-th target disease. Let represent the risk value of the i-th training sample for the j-th target disease, and log(·) represent the natural logarithm. or The smaller the absolute value of L, the smaller the overall loss L.
[0127] S334: Optimize the longitudinal risk trajectory prediction model in the direction of decreasing the total loss of the longitudinal risk trajectory prediction model.
[0128] In this embodiment of the application, the comprehensive loss of the longitudinal risk trajectory prediction model can be determined through step S333, and optimization can be performed based on the comprehensive loss of the longitudinal risk trajectory prediction model in step S334.
[0129] In the embodiments of this application, the longitudinal risk trajectory prediction model can be optimized by using the early stopping method and the learning decay method.
[0130] For early cessation method In this embodiment, the training samples can be divided into a training set and a validation set. Optimization is performed in rounds. In each round, the comprehensive loss of the longitudinal risk trajectory prediction model is first calculated using the training set, and the model parameters are optimized in the direction of decreasing the comprehensive loss. After optimization, the comprehensive loss of the model is validated using the validation set. When the comprehensive loss of the longitudinal risk trajectory prediction model validated using the validation set increases for several consecutive rounds (e.g., 5 rounds), it indicates that the longitudinal risk trajectory prediction model has overfitted. In this case, the model parameters from the round preceding the first increase in comprehensive loss can be used as the final model parameters.
[0131] For learning decay method In the embodiments of this application, the learning rate refers to the magnitude of the model parameter update, which can be achieved through step decay, exponential decay, or cosine annealing decay methods.
[0132] For example, in step decay, the learning rate can be multiplied by a decay factor (such as 0.1) every K training rounds, which is suitable for scenarios where the training data is stable and fast convergence is required (such as a single-target disease scenario).
[0133] With exponential decay, the learning rate in each round can decrease exponentially. This is suitable for scenarios requiring slow and fine-tuning (such as multi-target disease scenarios) to avoid parameter oscillations.
[0134] During cosine annealing, the learning rate decreases periodically with each epoch according to a cosine curve. It is suitable for scenarios with large training data and the need to escape local optima (such as longitudinal risk trajectory prediction models based on whole-genome data and comprehensive health indicator data).
[0135] In this embodiment of the application, after the longitudinal risk trajectory prediction model has been trained, the five-fold cross-validation method can also be used to verify the trained longitudinal risk trajectory prediction model.
[0136] For example, the training data can be divided into five parts, with four parts used for training and one part used for testing in rotation. The average of the five test results is then taken. The test results include three evaluation metrics: prediction accuracy, area under the receiver operating characteristic (ROC) curve, and calibration curve.
[0137] Among them, the prediction accuracy can reflect the overall prediction accuracy of the longitudinal risk trajectory model; AUC-ROC can reflect whether a longitudinal risk trajectory model can identify people at high risk of disease and people at low risk of health. Calibration curves can reflect the accuracy of the predicted probability of disease risk.
[0138] S400: Based on the risk value of the target user's target disease in multiple future time periods, determine the risk gradient value of the target user's target disease in each future time period.
[0139] In this embodiment of the application, after predicting the risk value of the target disease of the target user in various future time periods, a risk trajectory curve can be constructed based on the risk value of the target disease in each time period.
[0140] For example, for each target disease, such as breast cancer, colorectal cancer, and coronary heart disease, a risk evolution trajectory curve f(t) can be constructed for the next 5 years, with the horizontal axis representing time (months) and the vertical axis representing the risk value. The risk evolution trajectory curve for each target disease can visually display the trend of risk changes over time, such as stable, slowly rising, or rapidly rising.
[0141] After constructing the risk trajectory evolution curves for each target disease, the rate of risk change can be quantified based on the slope (risk gradient value) of the risk trajectory curves. For example, a gradient value of 0.02 / month indicates a 2% increase in the risk value per month, while a gradient value of -0.01 / month indicates a 1% decrease in the risk value per month.
[0142] S500: Based on the risk values and risk gradient values of the target disease in the future multiple time periods, determine the health screening strategy for the target disease.
[0143] In this embodiment, high-risk thresholds for each target disease can be set by combining clinical guidelines and population epidemiological data. For example, the high-risk threshold for breast cancer is a risk value of ≥5% within 5 years. Then, when the risk evolution trajectory curve for breast cancer predicts a risk value of ≥5% at a certain time point, a screening warning is triggered.
[0144] In addition, in this embodiment of the application, a risk gradient threshold can be set for each target disease. When the risk gradient value corresponding to any target disease is greater than its risk gradient threshold, it indicates that the target user's current risk value for the occurrence of the target disease is rising rapidly. At this time, even if the risk value for the occurrence of the target disease has not reached its corresponding high-risk threshold, a screening warning can still be triggered.
[0145] In this embodiment, by acquiring multimodal data about the target user's physiology and environment, the risk value of the target user's target disease in multiple future time periods is predicted using the multimodal data. Based on the risk value of the target disease in each future time period, the risk gradient value corresponding to the target disease is determined. Then, based on the target user's risk value and risk gradient value for each target disease, a targeted health screening strategy can be formulated, which fully considers individual differences and the formulated health screening strategy is more in line with the actual situation of each individual.
[0146] The above describes a medical data processing method according to an embodiment of this application. The following describes the medical data processing apparatus (e.g., a server) that performs the above medical data processing method.
[0147] See Figure 2 ,like Figure 2 The diagram shows a structural schematic of a medical data processing device, which can be applied in the field of medical data processing. The medical data processing device in this embodiment can achieve the functions described above. Figure 1 The steps of the medical data processing method executed in the corresponding embodiments are described above. The functions implemented by the medical data processing device can be implemented by hardware or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, and the modules can be software and / or hardware. The medical data processing device may include an input / output module 601 and a processing module 602. The functional implementation of the processing module 602 and the input / output module 601 can be found in [reference]. Figure 1 The operations performed in the corresponding embodiments will not be described in detail here. For example, the processing module 602 can be used to control the sending, receiving, and acquiring operations of the input / output module 601.
[0148] The input / output module 601 is configured to acquire electronic health records, genomic data, family medical history, lifestyle and behavior data, and environmental exposure data of the target user within a preset time window. The processing module 602 is configured to determine the target user's temporal characteristics, static baseline characteristics, and interaction characteristics based on the target user's electronic health record, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data; wherein, the temporal characteristics represent the evolution patterns of various health indicators and environmental indicators of the target user within a preset time window, the static baseline characteristics represent the target user's fixed basic health attribute indicators, and the interaction characteristics represent the correlation between any two of the target user's various health indicators, various environmental indicators, and basic health attribute indicators; Based on the temporal characteristics, interaction characteristics, and static baseline characteristics of the target user, predict the risk value of the target user for the occurrence of the target disease in multiple future time periods; Based on the risk values of the target user for the occurrence of the target disease in multiple future time periods, determine the risk gradient values of the target user for the target disease in each future time period; Based on the risk values and risk gradient values of the target disease in the future multiple time periods, a health screening strategy for the target disease is determined.
[0149] In some implementations, electronic health records include the following data: Past medical history data, including the following health indicators: history of hypertension, history of diabetes; Medical examination report; Laboratory test results data, including the following health indicators: tumor marker test results, blood lipid test results, and liver and kidney function; Medication record data, which includes the following health indicators: antihypertensive drug usage records and hypoglycemic drug usage records; Surgical history data, which includes the following health indicators: surgical name; Hospitalization record data, which includes the following health indicators: length of hospitalization; The lifestyle and behavioral data includes: Dietary habit data, which includes the following health indicators: fat intake, carbohydrate intake, and vegetable and fruit intake; Smoking data, including the following health indicators: years of smoking, daily amount of cigarettes smoked. The drinking data includes the following health indicators: years of drinking and daily alcohol consumption. Exercise data, which includes the following health indicators: exercise frequency, exercise intensity, and exercise duration; Sleep data, which includes the following health indicators: daily sleep duration, daily sleep quality, and daily bedtime; Physiological data, including the following health indicators: resting heart rate, heart rate variability (HRV), blood pressure, blood glucose, and weight; The environmental exposure data includes the following environmental indicators: PM2.5 concentration, solar radiation intensity, temperature, humidity, and information on the prevalence of regional infectious diseases.
[0150] In some implementations, the time-series characteristic of the blood pressure index is: time-series blood pressure value; The time-series characteristic of the blood glucose index is: time-series blood glucose value; The temporal characteristics of the static heart rate index are: temporal static heart rate; The temporal characteristics of the heart rate variability index are: temporal heart rate variability; The time-series characteristics of the liver and kidney function indicators are: time-series red blood cell count, time-series white blood cell count, and time-series hemoglobin cell count; The time-series characteristics of the blood lipid indicators are: time-series transaminase levels and time-series creatinine levels; The temporal characteristics of the tumor marker indicators are: temporal tumor marker content; The time-series characteristics of the blood lipid index are: time-series blood lipid values; The temporal characteristics of the exercise duration index are: the duration of each exercise session; The temporal characteristic of the motion frequency index is: the number of motion cycles in a time series; The temporal characteristics of the exercise intensity index are: the energy consumption value of each exercise session over time; The temporal characteristics of the sleep time index are: daily sleep duration. The temporal characteristics of the sleep onset time index are: daily sleep onset time; The temporal characteristics of the sleep quality index are: temporal daily sleep quality score; The time-series characteristics of the PM2.5 concentration index are: time-series PM2.5 concentration; The temporal characteristics of the solar radiation intensity index are: temporal solar radiation intensity values; The time-series characteristic of the temperature index is: time-series temperature value; The temporal characteristic of the humidity index is: temporal humidity value; The time-series characteristics of the smoking years index and the daily smoking amount index are respectively: time-series daily smoking years and time-series daily smoking amount; The time-series characteristics of the drinking duration index and the daily drinking volume index are respectively: time-series daily drinking duration and time-series daily drinking volume.
[0151] In some implementations, the static baseline features include: pathogenic mutation gene information, family medical history, blood type, and gender.
[0152] In some implementations, the interaction features include: Decreased exercise frequency and increased weight; Insufficient exercise intensity and high blood pressure; Insufficient exercise duration and large fluctuations in blood sugar; Increased exercise frequency without a decrease in carbohydrate intake; High-protein diet and low-intensity exercise; Insufficient sleep duration and low heart rate variability; Delayed sleep onset and high blood sugar; PM2.5 concentrations are elevated and lung function is declining; Insufficient sunlight and vitamin D deficiency; Long-term smoking history and positive family history of lung cancer; High alcohol consumption and a positive history of hypertension.
[0153] In some implementations, the processing module is further configured to: According to the preset aggregation window, each time series feature is transformed into a time series aggregation window feature; wherein, each time series aggregation window feature includes multiple aggregation windows, each aggregation window includes multiple time steps, and the feature corresponding to each aggregation window is obtained by aggregating the features corresponding to each time step under that aggregation window; Based on the characteristics of each time-series aggregation window, the interaction characteristics of the target user under each preset aggregation window are determined.
[0154] In some implementations, the processing module is further configured to: Based on the temporal features, the static baseline features, and the interaction features, a fusion feature matrix is obtained by concatenation. Based on the fusion feature matrix, the risk value of the target user for the occurrence of the target disease in multiple future time periods is predicted.
[0155] In some implementations, the processing module is further configured to: Based on each of the aforementioned time-series features, a time-series aggregation window feature matrix is determined; wherein, the time-series aggregation window feature matrix contains the time-series aggregation features corresponding to each of the aforementioned time-series features; Based on each of the static baseline features, a static baseline feature expansion matrix is determined; wherein, the static baseline feature expansion matrix includes static baseline expansion features corresponding to each of the static baseline features, and the dimension of each static baseline expansion feature is the same as the dimension of each of the time-series window aggregation features; Based on the aforementioned interaction features, an interaction feature matrix is determined. Based on the temporal aggregation window feature matrix, the static baseline feature extension matrix, and the interactive feature matrix, a fused feature matrix is obtained.
[0156] In this embodiment, the processing module 602 predicts the risk value of the target user's target disease in multiple future time periods based on the target user's multimodal data on physiology and environment. Based on the risk value of the target disease in each future time period, it determines the risk gradient value corresponding to the target disease. Then, based on the target user's risk value and risk gradient value for the target disease, a targeted health screening strategy can be formulated, which fully considers individual differences and the formulated health screening strategy is more in line with the actual situation of each individual.
[0157] The medical data processing device 60 in the embodiments of this application has been described above from the perspective of modular functional entities. The medical data processing device in the embodiments of this application will be described below from the perspective of hardware processing.
[0158] It should be noted that, Figure 2 The physical device corresponding to the input / output module 601 shown can be a transceiver, radio frequency circuit, communication module, and input / output (I / O) interface, etc., and the physical device corresponding to the processing module 602 can be a processor.
[0159] Figure 2 The devices shown can all have the following characteristics: Figure 3 The structure shown, when Figure 2 The medical data processing device 60 shown has, for example: Figure 3 When the structure shown is used, Figure 3 The processor and transceiver in the device can perform the same or similar functions as the processing module 602 and input / output module 601 provided in the aforementioned device embodiments. Figure 3 The memory in the processor stores the computer programs that need to be called when executing the above medical data processing methods.
[0160] This application also relates to a chip system, which includes at least one processor and an interface circuit. The processor includes multiple vector storage units. The processor is used to execute instruction and / or data interaction through the interface circuit, so that the chip system performs the medical data processing method of any of the above embodiments.
[0161] In one possible implementation, the chip system may also directly include a memory in which computer programs or computer instructions are stored.
[0162] For example, the memory can be volatile memory or non-volatile memory, or may include both. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which serves as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct rambus RAM (DRRAM).
[0163] This application also relates to a processor, which includes a plurality of storage units for calling computer programs or computer instructions stored in the memory to cause the processor to execute the methods described in any of the above embodiments.
[0164] For example, in the embodiments of this application, the processor is an integrated circuit chip with signal processing capabilities. For instance, the processor may be an FPGA, a general-purpose processor, a DSP, an ASIC, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, a SoC, a CPU, a network processor (NP), a microcontroller unit (MCU), a PLD, or other integrated chips, capable of implementing or executing the methods, steps, and logic block diagrams disclosed in the embodiments of this application. In one possible implementation, the embodiments of this application also provide a computer-readable storage medium storing program code, which, when executed on the computer, causes the computer to perform the above-described method embodiments.
[0165] This application also provides a terminal device, such as... Figure 4 As shown, for ease of explanation, only the parts related to the embodiments of this application are shown. For specific technical details not disclosed, please refer to the method section of the embodiments of this application. The terminal device can be any terminal device including mobile phones, tablets, personal digital assistants (PDAs), point-of-sale (POS) terminals, in-vehicle computers, etc. Taking a mobile phone as an example: Figure 4 This diagram illustrates a partial structure of a mobile phone related to the terminal device provided in the embodiments of this application. (Reference) Figure 4 The mobile phone includes components such as a radio frequency (RF) circuit 1010, a memory 1020, an input unit 1030, a display unit 1040, a sensor 1050, an audio circuit 1060, a wireless fidelity (WiFi) module 1070, a processor 1080, and a power supply 1090. Those skilled in the art will understand that... Figure 4 The mobile phone structure shown does not constitute a limitation on the mobile phone and may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0166] The following is combined Figure 4 A detailed introduction to each component of a mobile phone: The RF circuit 1010 can be used for receiving and transmitting signals during information transmission or calls. Specifically, it receives downlink information from the base station and processes it with the processor 1080; additionally, it transmits uplink data to the base station. Typically, the RF circuit 1010 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low-noise amplifier (LNA), a duplexer, etc. Furthermore, the RF circuit 1010 can also communicate wirelessly with networks and other devices. The aforementioned wireless communication can use any communication standard or protocol, including but not limited to Global System for Mobile Communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, and Short Messaging Service (SMS).
[0167] The memory 1020 can be used to store software programs and modules. The processor 1080 executes various mobile phone functions and data processing by running the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, applications required for at least one function (such as sound playback function, image playback function, etc.), etc.; the data storage area may store data created according to the use of the mobile phone (such as audio data, phonebook, etc.). In addition, the memory 1020 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device.
[0168] The input unit 1030 can be used to receive input numerical or character information, and to generate key signal inputs related to user settings and function control of the mobile phone. Specifically, the input unit 1030 may include a touch panel 1031 and other input devices 1032. The touch panel 1031, also known as a touch screen, can collect touch operations performed by the user on or near it (such as operations performed by the user using a finger, stylus, or any suitable object or accessory on or near the touch panel 1031), and drive the corresponding connection devices according to a pre-set program. Optionally, the touch panel 1031 may include two parts: a touch detection device and a touch controller. The touch detection device detects the user's touch position and the signal generated by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into touch point coordinates, and sends it to the processor 1080, and can also receive and execute commands sent by the processor 1080. In addition, the touch panel 1031 can be implemented using various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 1031, the input unit 1030 may also include other input devices 1032. Specifically, other input devices 1032 may include, but are not limited to, one or more of the following: physical keyboard, function keys (such as volume control buttons, power buttons, etc.), trackball, mouse, joystick, etc.
[0169] The display unit 1040 can be used to display information input by the user or information provided to the user, as well as various menus of the mobile phone. The display unit 1040 may include a display panel 1041, which may optionally be configured as a liquid crystal display (LCD), organic light-emitting diode (OLED), or similar display. Further, a touch panel 1031 may cover the display panel 1041. When the touch panel 1031 detects a touch operation on or near it, it transmits the information to the processor 1080 to determine the type of touch event. Subsequently, the processor 1080 provides corresponding visual output on the display panel 1041 based on the type of touch event. Although in Figure 4 In this embodiment, the touch panel 1031 and the display panel 1041 are two separate components to realize the input and output functions of the mobile phone. However, in some embodiments, the touch panel 1031 and the display panel 1041 can be integrated to realize the input and output functions of the mobile phone.
[0170] The mobile phone may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor. The ambient light sensor can adjust the brightness of the display panel 1041 according to the ambient light level, and the proximity sensor can turn off the display panel 1041 and / or the backlight when the phone is moved to the ear. As a type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes). When stationary, it can detect the magnitude and direction of gravity and can be used for applications that recognize the phone's posture (such as landscape / portrait switching, related games, magnetometer posture calibration), vibration recognition-related functions (such as pedometer, taps), etc. Other sensors that may be configured in the mobile phone, such as gyroscopes, barometers, hygrometers, thermometers, and infrared sensors, will not be described in detail here.
[0171] The audio circuit 1060, speaker 1061, and microphone 1062 provide an audio interface between the user and the mobile phone. The audio circuit 1060 converts the received audio data into electrical signals and transmits them to the speaker 1061, where the speaker 1061 converts them into sound signals for output. On the other hand, the microphone 1062 converts the collected sound signals into electrical signals, which are then received by the audio circuit 1060, converted into audio data, and then processed by the processor 1080 before being transmitted via the RF circuit 1010 to, for example, another mobile phone, or the audio data can be output to the memory 1020 for further processing.
[0172] Wi-Fi is a short-range wireless transmission technology. Through the Wi-Fi module 1070, mobile phones can help users send and receive emails, browse web pages, and access streaming media, providing users with wireless broadband internet access. Although Figure 4 The Wi-Fi module 1070 is shown, but it is understood that it is not an essential component of a mobile phone and can be omitted as needed without changing the essence of the invention.
[0173] The processor 1080 is the control center of the mobile phone, connecting various parts of the phone through various interfaces and lines. It executes software programs and / or modules stored in the memory 1020 and calls data stored in the memory 1020 to perform various functions and process data, thereby providing overall monitoring of the phone. Optionally, the processor 1080 may include one or more processing units; optionally, the processor 1080 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the aforementioned modem processor may also not be integrated into the processor 1080.
[0174] The mobile phone also includes a power supply 1090 (such as a battery) that supplies power to various components. Optionally, the power supply can be logically connected to the processor 1080 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system.
[0175] Although not shown, mobile phones may also include a camera, Bluetooth module, etc., which will not be described in detail here.
[0176] In this embodiment of the application, the processor 1080 included in the mobile phone also has the function of controlling the execution of the medical data processing method flow executed by the medical data processing device.
[0177] This application also provides a server; please refer to [link / reference]. Figure 5 , Figure 5 This is a schematic diagram of a server structure provided in an embodiment of this application. The server 1100 can vary significantly due to different configurations or performance. It may include one or more central processing units (CPUs) 1122 (e.g., one or more processors) and memory 1132, and one or more storage media 1130 (e.g., one or more mass storage devices) for storing application programs 1142 or data 1144. The memory 1132 and storage media 1130 may be temporary or persistent storage. The program stored in the storage media 1130 may include one or more modules (not shown in the figure), each module may include a series of instruction operations on the server. Furthermore, the CPU 1122 may be configured to communicate with the storage media 1130 and execute the series of instruction operations in the storage media 1130 on the server 1100.
[0178] Server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input / output interfaces 1158, and / or one or more operating systems 1141, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
[0179] The steps performed by the server in the above embodiments can be based on this Figure 5 The structure of server 1100 is shown. For example, as in the above embodiment, it is... Figure 2 The steps performed by the medical data processing device 60 shown can be based on this Figure 5 The server architecture is shown. For example, the central processing unit 1122 performs the following operations by calling instructions from memory 1132: The electronic health records, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data of the target user within a preset time window are obtained through the input / output interface 1158.
[0180] Based on the target user's electronic health record, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data, the target user's temporal characteristics, static baseline characteristics, and interaction characteristics are determined. The temporal characteristics represent the evolution of various health indicators and environmental indicators of the target user within a preset time window. The static baseline characteristics represent the target user's fixed basic health attribute indicators. The interaction characteristics represent the correlation between any two of the target user's various health indicators, various environmental indicators, and basic health attribute indicators. Based on the temporal characteristics, interaction characteristics, and static baseline characteristics of the target user, predict the risk value of the target user for the occurrence of the target disease in multiple future time periods; Based on the risk values of the target user for the occurrence of the target disease in multiple future time periods, determine the risk gradient values of the target user for the target disease in each future time period; Based on the risk values and risk gradient values of the target disease in the future multiple time periods, a health screening strategy for the target disease is determined.
[0181] Health screening strategies can also be output via input / output interface 1158.
[0182] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0183] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and modules described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0184] In the embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection between devices or modules through some interfaces, and may be electrical, mechanical, or other forms.
[0185] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; that is, they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0186] Furthermore, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can be stored in a computer-readable storage medium.
[0187] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.
[0188] The computer program product includes one or more computer instructions. When the computer program is loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a server or data center that integrates one or more available media. The available medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., a solid-state disk (SSD)).
[0189] The technical solutions provided in the embodiments of this application have been described in detail above. Specific examples have been used in the embodiments of this application to illustrate the principles and implementation methods of the embodiments of this application. The description of the above embodiments is only for the purpose of helping to understand the methods and core ideas of the embodiments of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the embodiments of this application. Therefore, the content of this specification should not be construed as a limitation on the embodiments of this application.
Claims
1. A medical data processing method, characterized in that, The method includes: Acquire electronic health records, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data of target users within a preset time window; Based on the target user's electronic health record, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data, the target user's temporal characteristics, static baseline characteristics, and interaction characteristics are determined. The temporal characteristics represent the evolution of various health indicators and environmental indicators of the target user within a preset time window. The static baseline characteristics represent the target user's fixed basic health attribute indicators. The interaction characteristics represent the correlation between any two of the target user's various health indicators, various environmental indicators, and basic health attribute indicators. Based on the temporal characteristics, interaction characteristics, and static baseline characteristics of the target user, predict the risk value of the target user for the occurrence of the target disease in multiple future time periods; Based on the risk values of the target user for the occurrence of the target disease in multiple future time periods, determine the risk gradient values of the target user for the target disease in each future time period; Based on the risk values and risk gradient values of the target disease in the future multiple time periods, a health screening strategy for the target disease is determined.
2. The medical data processing method as described in claim 1, characterized in that, The static baseline characteristics include: pathogenic mutation gene information, family medical history, blood type, and gender.
3. The medical data processing method as described in claim 1, characterized in that, The interaction features are determined based on the static baseline features and the temporal features in the following manner: According to the preset aggregation window, each time series feature is transformed into a time series aggregation window feature; wherein, each time series aggregation window feature includes multiple aggregation windows, each aggregation window includes multiple time steps, and the feature corresponding to each aggregation window is obtained by aggregating the features corresponding to each time step under that aggregation window; Based on the characteristics of each time-series aggregation window, the interaction characteristics of the target user under each preset aggregation window are determined.
4. The medical data processing method as described in claim 1, characterized in that, The method of predicting the risk value of the target user for the occurrence of the target disease in multiple future time periods based on the target user's temporal characteristics, interaction characteristics, and static baseline characteristics includes: Based on the temporal features, the static baseline features, and the interaction features, a fusion feature matrix is obtained by concatenation. Based on the fusion feature matrix, the risk value of the target user for the occurrence of the target disease in multiple future time periods is predicted.
5. The medical data processing method as described in claim 4, characterized in that, The temporal aggregation window features, static baseline features, and interaction features are concatenated to obtain a fused feature matrix, including: Based on each of the aforementioned time-series features, a time-series aggregation window feature matrix is determined; wherein, the time-series aggregation window feature matrix contains the time-series aggregation features corresponding to each of the aforementioned time-series features; Based on each of the static baseline features, a static baseline feature expansion matrix is determined; wherein, the static baseline feature expansion matrix includes static baseline expansion features corresponding to each of the static baseline features, and the dimension of each static baseline expansion feature is the same as the dimension of each of the time-series window aggregation features; Based on the aforementioned interaction features, an interaction feature matrix is determined. Based on the temporal aggregation window feature matrix, the static baseline feature extension matrix, and the interactive feature matrix, a fused feature matrix is obtained.
6. A medical data processing device, characterized in that, The device includes: The input / output module is configured to acquire electronic health records, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data of target users within a preset time window; The processing module is configured to determine the target user's temporal characteristics, static baseline characteristics, and interaction characteristics based on the target user's electronic health record, genomic data, family medical history, lifestyle and behavioral data, and environmental exposure data. The temporal characteristics represent the evolution patterns of various health indicators and environmental indicators of the target user within a preset time window; the static baseline characteristics represent the target user's fixed basic health attribute indicators; and the interaction characteristics represent the correlation between any two of the target user's various health indicators, various environmental indicators, and basic health attribute indicators. Based on the temporal characteristics, interaction characteristics, and static baseline characteristics of the target user, predict the risk value of the target user for the occurrence of the target disease in multiple future time periods; Based on the risk values of the target user for the occurrence of the target disease in multiple future time periods, determine the risk gradient values of the target user for the target disease in each future time period; Based on the risk values and risk gradient values of the target disease in the future multiple time periods, a health screening strategy for the target disease is determined.
7. The medical data processing device as described in claim 6, characterized in that, The processing module is also configured to: According to the preset aggregation window, each time series feature is transformed into a time series aggregation window feature; wherein, each time series aggregation window feature includes multiple aggregation windows, each aggregation window includes multiple time steps, and the feature corresponding to each aggregation window is obtained by aggregating the features corresponding to each time step under that aggregation window; Based on the characteristics of each time-series aggregation window, the interaction characteristics of the target user under each preset aggregation window are determined.
8. A computing device, characterized in that, It includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the method of any one of claims 1-5.
9. A computer-readable storage medium, characterized in that, It includes instructions that, when run on a computer, cause the computer to perform the method as described in any one of claims 1-5.
10. A computer program product comprising computer instructions, characterized in that, When executed by a processor, the computer instructions implement the method described in any one of claims 1-5.