Disease early warning method and system for heterogeneous data integration

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By performing format conversion and semantic mapping on multi-source heterogeneous data, a semantic association graph is constructed. Combined with a disease risk assessment indicator system, the problem of semantic misalignment of multi-source heterogeneous data is solved, and more accurate disease early warning is achieved.

CN122201724APending Publication Date: 2026-06-12GUANGZHOU LOFTY MED-PATH HEALTH-CARE CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: GUANGZHOU LOFTY MED-PATH HEALTH-CARE CO LTD
Filing Date: 2026-03-13
Publication Date: 2026-06-12

Application Information

Patent Timeline

13 Mar 2026

Application

12 Jun 2026

Publication

CN122201724A

IPC: G16H50/20; G16H50/30; G16H50/70; G06F40/103; G06F40/151; G06F40/30; G06F40/242; G06F40/284; G06F18/22

AI Tagging

Application Domain

Medical data miningHealth-index calculation

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122201724A_ABST

Patent Text Reader

Abstract

The application relates to a disease early warning method and system for heterogeneous data integration, comprising the following steps: extracting disease-related data distributed in multiple heterogeneous data sources, and performing format conversion on the disease-related data based on preset data format conversion rules to obtain data in a unified format; performing semantic mapping on the data in the unified format to obtain semantic unified data, and performing disease risk analysis on the semantic unified data based on a preset disease risk assessment index system to obtain a disease risk result; generating disease early warning information based on the disease risk result, and sending the disease early warning information to relevant personnel, so that the problem that multiple-source heterogeneous data cannot be aligned at a semantic level is solved.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of disease early warning technology, and in particular to a disease early warning method and system that integrates heterogeneous data. Background Technology

[0002] With the continuous development of medical informatization, various medical institutions, public health departments, and wearable device platforms have accumulated massive amounts of disease-related data. Current technologies typically employ data warehouses or ETL (Extract, Transform, Load) tools to extract data from sources such as electronic medical record systems, laboratory information management systems, and health monitoring platforms. This data is then standardized according to fixed field mapping rules and subsequently combined with pre-defined clinical rules or statistical models to monitor and assess the health status and risks of individuals or groups, enabling preliminary early warning functions in scenarios such as chronic disease management and infectious disease surveillance.

[0003] However, due to significant differences in terminology, coding standards, and data structures among various data sources, existing methods struggle to effectively identify the semantic equivalence of the same medical concepts across different systems when integrating multi-source heterogeneous data. This results in the inability to accurately align the clinical meanings of key health indicators even after format conversion, such as misclassifying blood glucose values from different testing methods as the same indicator, thus affecting the accuracy of subsequent risk assessments. Therefore, resolving the semantic misalignment issue among multi-source heterogeneous data has become a key technical obstacle to improving the accuracy of disease early warning. Summary of the Invention

[0004] The main technical problem addressed in this application is to provide a disease early warning method and system for integrating heterogeneous data, and to solve the problem of semantic misalignment of multi-source heterogeneous data.

[0005] To address the aforementioned technical problems, this application employs a disease early warning method based on heterogeneous data integration, comprising the following steps: Extract disease-related data scattered across multiple heterogeneous data sources, and convert the disease-related data into a unified format based on preset data format conversion rules; Semantic mapping is performed on the data in the unified format to obtain semantically unified data, and disease risk analysis is performed on the semantically unified data based on a preset disease risk assessment index system to obtain disease risk results; Based on the disease risk results, a disease warning message is generated and sent to relevant personnel.

[0006] Furthermore, the extraction of disease-related data scattered across multiple heterogeneous data sources includes: Based on preset data source configuration parameters, multiple heterogeneous data sources are adapted to obtain a set of data source interfaces; wherein, the set of data source interfaces includes relational database interfaces, document database interfaces, and medical information system interfaces; The data source interface set is used to read data from the multiple heterogeneous data sources to obtain the original dataset. Based on the preset disease data filtering conditions, the original dataset is filtered by fields to obtain the disease-related data, which includes basic patient information, diagnostic records, and test data.

[0007] Furthermore, the process of converting disease-related data into a unified format based on preset data format conversion rules includes: The disease-related data is parsed according to preset format recognition rules to obtain data format identifiers; Based on the data format identifier, the corresponding field mapping table is matched from the preset data format conversion rules, and the disease-related data is reorganized and encoded using the field mapping table to obtain the data in the unified format.

[0008] Furthermore, the step of semantically mapping the data in the unified format to obtain semantically unified data includes: The data in the unified format is parsed for field content. The text content in each field of the unified format data is decomposed word by word and matched with semantic features through a preset semantic feature dictionary to obtain field semantic features. Based on the field semantic features, each field is labeled with a semantic category to obtain labeled semantic fields. Based on a preset semantic association rule base, matched labeled semantic fields with logical associations, connected the mutually related labeled semantic fields to construct a semantic association graph, and performed consistency verification on the semantic fields in the semantic association graph to obtain a corrected semantic association graph. The modified semantic association graph is standardized according to the preset semantic standard template. The semantic fields in the modified semantic association graph are standardized in terms of field name, data type and encoding format according to the format and specification of the standard template to obtain semantically unified data.

[0009] Furthermore, the step of connecting the mutually related labeled semantic fields to construct a semantic association graph includes: Based on the field association conditions in the semantic association rule base, the labeled semantic fields are matched to obtain field association pairs. The field association pairs include the attribution association between patient identifier and diagnosis record, the causal association between diagnosis record and test data, and the temporal association between test data and time stamp. The structure of nodes and edges is transformed on the field association pairs, and the labeled semantic fields are used as graph nodes, and the association relationships in the field association pairs are used as graph edges to obtain the initial semantic graph. Based on preset graph structure rules, connectivity detection and isolated node identification are performed on the initial semantic graph to obtain graph structure detection results. Then, based on the graph structure detection results, isolated nodes in the initial semantic graph are deleted and broken edges are filled to obtain the semantic association graph.

[0010] Furthermore, the disease risk analysis performed on the semantically unified data based on the preset disease risk assessment index system to obtain disease risk results includes: Based on the list of indicator fields in the disease risk assessment indicator system, the semantically unified data is matched and numerically extracted to obtain a set of patient indicator values. Based on the weight coefficients corresponding to each indicator in the disease risk assessment indicator system, the values of each item in the patient indicator value set are weighted and summed to obtain the patient's comprehensive risk score. The patient's comprehensive risk score is determined based on a preset risk level classification threshold to obtain the disease risk result.

[0011] Furthermore, the step of generating disease warning information based on the disease risk results and sending the disease warning information to relevant personnel includes: The risk level in the disease risk results is compared and determined based on a preset warning trigger threshold to obtain a warning trigger record. Then, the associated fields are extracted from the disease risk results based on the warning trigger record to obtain the warning content field. The disease warning information is obtained by formatting and assembling the fields of the warning content using a preset warning information template, and then by timestamping and prioritizing the disease warning information to obtain the annotated disease warning information. Based on a preset personnel responsibility checklist, the risk levels in the labeled disease warning information are matched with recipients to obtain a target recipient list, and the disease warning information is sent to relevant personnel in the target recipient list through a message push channel.

[0012] This invention also implements a disease early warning system that integrates heterogeneous data, comprising: The extraction module is used to extract disease-related data scattered across multiple heterogeneous data sources and convert the disease-related data into a unified format based on preset data format conversion rules. The mapping module is used to perform semantic mapping on the unified format data to obtain semantically unified data, and to analyze the semantically unified data based on a preset disease risk assessment index system to obtain disease risk results. The sending module is used to generate disease warning information based on the disease risk results and send the disease warning information to relevant personnel.

[0013] The present invention also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of any of the above methods.

[0014] The present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of any of the above methods.

[0015] The above scheme extracts disease-related data scattered across multiple heterogeneous data sources and converts the data according to preset data format conversion rules to obtain data in a unified format. Semantic mapping is then performed on this unified data to obtain semantically unified data. Disease risk analysis is then conducted on this semantically unified data based on a preset disease risk assessment index system to obtain disease risk results. Based on these results, disease warning information is generated and sent to relevant personnel. This approach solves the problem of semantic misalignment between multi-source heterogeneous data and enables analysis of semantically unified data based on a constructed disease risk assessment index system. This more accurately reflects the true health status of individuals or groups, improving the scientific rigor and reliability of disease risk assessment. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0017] Figure 1 This is a schematic diagram of the steps of a disease early warning method for integrating heterogeneous data in one embodiment of the present invention; Figure 2 This is a structural block diagram of a disease early warning system integrating heterogeneous data in one embodiment of the present invention; Figure 3 This is a schematic block diagram of the structure of a computer device according to an embodiment of the present invention.

[0018] The objectives, features, and advantages of this invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0019] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0020] Specifically, the disease early warning method for heterogeneous data integration in this embodiment includes the following steps: like Figure 1 As shown, Figure 1 This invention provides a disease early warning method based on heterogeneous data integration, comprising the following steps: Step S1: Extract disease-related data scattered across multiple heterogeneous data sources, and convert the disease-related data into a unified format based on preset data format conversion rules.

[0021] Specifically, the process begins by extracting patient data such as blood pressure, blood sugar, and heart rate from various data sources, including hospital electronic medical record systems, regional health platforms, and home smart monitoring devices. Then, based on pre-defined data format conversion rules, field names like "systolic pressure," "SBP," and "high pressure" are all converted to the unified "systolic_pressure," while the numerical units are standardized to mmHg, and the date and time are standardized to UTC timestamp format. This completes the conversion to a unified data format. This process is essentially the implementation of "converting disease-related data based on pre-defined data format conversion rules." Field naming mapping and unit standardization are key operations, ensuring that data from various sources can be recognized and used within the same framework during subsequent processing.

[0022] Step S2: Semantically map the data in the unified format to obtain semantically unified data, and perform disease risk analysis on the semantically unified data based on a preset disease risk assessment index system to obtain disease risk results.

[0023] Specifically, the unified format data, which has already undergone format conversion, is input into a semantic mapping module configured with a medical ontology library. By matching the LOINC or SNOMED CT coding system, indicators with the same clinical significance but different expressions are normalized. For example, "glu_fast" and "FBG," representing fasting blood glucose, are both mapped to the standard concept "15074-8|Glucose^SCNC|". This step is the specific way to achieve "semantic mapping of the unified format data". The semantically unified data obtained after mapping is sent to the analysis engine, where it is calculated using threshold rules and weighted models in a preset disease risk assessment indicator system. For example, based on the combination of parameters such as BMI and glycated hemoglobin in the diabetes risk model, it determines whether there is a high-risk tendency, thereby deriving the disease risk result. The entire process is interconnected, supporting the foundation for subsequent early warning generation.

[0024] Step S3: Generate disease warning information based on the disease risk results and send the disease warning information to relevant personnel.

[0025] Specifically, based on the disease risk results derived from the aforementioned analysis, the system triggers an early warning generation logic. For example, when a patient's risk score exceeds a preset threshold of 70 points, a structured message containing an individual identifier, risk type, confidence level, and recommended measures is automatically generated. This message is the concrete manifestation of the "disease early warning information." This information is then sent to the responsible physician or health management team members via the hospital's internal instant messaging interface or encrypted SMS channel. In some high-risk cases, it is also simultaneously pushed to the monitoring port of the regional public health platform. The entire sending process relies on a pre-configured personnel role mapping table and communication protocol to ensure that the information is accurately delivered to the corresponding responsible person. This mechanism directly realizes the closed loop of "sending the aforementioned disease early warning information to relevant personnel," ensuring the timeliness and traceability of the early warning response.

[0026] In a specific embodiment, the extraction of disease-related data scattered across multiple heterogeneous data sources includes: Based on preset data source configuration parameters, multiple heterogeneous data sources are adapted to obtain a set of data source interfaces; wherein, the set of data source interfaces includes relational database interfaces, document database interfaces, and medical information system interfaces; The data source interface set is used to read data from the multiple heterogeneous data sources to obtain the original dataset. Based on the preset disease data filtering conditions, the original dataset is filtered by fields to obtain the disease-related data, which includes basic patient information, diagnostic records, and test data.

[0027] Specifically, when implementing the operation of "extracting disease-related data scattered across multiple heterogeneous data sources," the first step is to prepare for access to different types of data storage systems. Specifically, based on pre-defined data source configuration parameters, such as database type, IP address, port, authentication credentials, and access protocol version, interface adaptation is performed for sources with different data structure characteristics, such as hospital HIS systems, regional health record databases, and third-party testing platforms. This process generates a set of data source interfaces covering multiple communication methods, explicitly including relational database interfaces (such as JDBC driver instances for connecting to MySQL or Oracle), document database interfaces (such as the native API used to connect to MongoDB), and interfaces for medical information systems conforming to HL7 or FHIR standards. These interfaces are not general-purpose components but rather stable connection channels formed by customized configurations based on the actual deployment environment of each target system.

[0028] With these interfaces, the system can concurrently read raw data from various sources to form a preliminary raw dataset. Next, a filtering phase is initiated, where the raw data is filtered at the field level according to preset disease data filtering criteria, removing content irrelevant to the current analysis, such as patient consumption records or device log entries. Taking cardiovascular disease monitoring as an example, the filtering criteria might be set to retain records of "age greater than 40 years," "presence of hypertension diagnosis code ICD-10:I10," and "most recent low-density lipoprotein exceeding 3.4 mmol / L," and extract the corresponding patient's name, ID number, medical record number, chief complaint, medication list, and blood lipid values from the laboratory report. The information ultimately retained constitutes the aforementioned disease-related data, with its core content focusing on three key elements: basic patient information, diagnostic records, and laboratory data.

[0029] This step is essentially a targeted extraction of the raw data stream, ensuring that subsequent processes handle high-value, clinically significant data segments and providing clear data boundaries for subsequent format conversion and semantic mapping. The entire extraction process is configuration-driven, supporting dynamic updates to the data source list and filtering rules, adapting to the frequent upgrades of healthcare information systems.

[0030] In a specific embodiment, the step of converting disease-related data into a uniform format based on preset data format conversion rules includes: The disease-related data is parsed according to preset format recognition rules to obtain data format identifiers; Based on the data format identifier, the corresponding field mapping table is matched from the preset data format conversion rules, and the disease-related data is reorganized and encoded using the field mapping table to obtain the data in the unified format.

[0031] Specifically, after extracting disease-related data, it needs to be standardized to support subsequent analysis. This process is a concrete manifestation of "converting disease-related data into a unified format based on preset data format conversion rules." In practice, the same conversion logic is not applied to all data directly. Instead, preset format recognition rules are used to determine the original structural characteristics of each batch of input data.

[0032] For example, the system receives a CSV file record from a community health center, whose header fields include "patient_id", "visit_date", "DBP", and "SBP". Another message from a tertiary hospital has a JSON body containing key-value pairs such as "emrNo", "examTime", "diastolicPressureValue", and "systolicPressureValue". In this case, the format recognition rules extract the corresponding data structure information based on the file extension, delimiter type, or JSON schema characteristics, and generate a data format identifier representing its source form, such as "CSV_BP_2023" or "JSON_EMRSYS_V1". This identifier is not an arbitrary label but strictly corresponds to entries in the backend configured conversion rule library. Next, based on this data format identifier, the system searches for matches in the pre-stored data format conversion rule set and locates the corresponding field mapping table.

[0033] For example, when data is identified as belonging to the "CSV_BP_2023" type, the associated mapping table is invoked. This table explicitly defines field renaming relationships such as "DBP→diastolic_pressure", "SBP→systolic_pressure", and "visit_date→record_time". It also specifies that numerical units are converted from kPa to mmHg, and time formats are uniformly converted to the ISO 8601 standard. Field reorganization is performed according to this table; fields in the original data are reorganized according to the mapping relationships, missing fields are filled with null values, and redundant fields are discarded. Encoding conversion further processes character sets (e.g., GBK to UTF-8) and medical coding systems (e.g., mapping local diagnostic codes to ICD-10 standard encoding). The final output is a set of data records with consistent structure, standardized naming, and unified units—the unified format data. The entire process relies on an externally configurable rule file driver, allowing new data sources to be added simply by registering new format identifiers and mapping tables, without modifying the core code, thus improving the system's scalability and maintenance convenience.

[0034] In a specific embodiment, the step of semantically mapping the unified format data to obtain semantically unified data includes: The data in the unified format is parsed for field content. The text content in each field of the unified format data is decomposed word by word and matched with semantic features through a preset semantic feature dictionary to obtain field semantic features. Based on the field semantic features, each field is labeled with a semantic category to obtain labeled semantic fields. Based on a preset semantic association rule base, matched labeled semantic fields with logical associations, connected the mutually related labeled semantic fields to construct a semantic association graph, and performed consistency verification on the semantic fields in the semantic association graph to obtain a corrected semantic association graph. The modified semantic association graph is standardized according to the preset semantic standard template. The semantic fields in the modified semantic association graph are standardized in terms of field name, data type and encoding format according to the format and specification of the standard template to obtain semantically unified data.

[0035] Specifically, semantic integration of the aforementioned data converted to a unified format is a key step in achieving cross-source data comparability. This process unfolds as a complete technical path of "semantically mapping the unified format data to obtain semantically unified data." First, the system parses the content of each field in each record. For example, if a data entry contains a text item with the field name "lab_item_name" and the value "fasting blood glucose," the system will initiate a word-by-word decomposition mechanism, breaking it down into two lexical units: "fasting" and "blood glucose." It then calls a pre-defined semantic feature dictionary for matching and searching. This dictionary maintains a mapping relationship between common medical terms and their corresponding semantic tags; for example, it associates "blood glucose" with biomarkers and marks "fasting" as a sampling status modifier.

[0036] This matching process extracts the semantic features carried by the field, and then, combined with contextual rules, assigns a semantic category to the field as a whole, such as determining that it belongs to the category "Test Item - Glucose Metabolism Related", thus forming a labeled semantic field. After multiple fields undergo this processing, they enter the association construction stage. The system identifies labeled semantic fields with logical connections based on a pre-built semantic association rule library. For example, when a data row is found to contain both a field labeled "Test Item - Glycated Hemoglobin" and another field labeled "Test Result Value", and both are structurally located in the same record unit, the "indicator-value" association mode in the rule library is triggered, and a pointing relationship is established between the two, gradually forming a semantic association graph covering multiple entity nodes and relationship edges.

[0037] This graph initially expresses the intrinsic relationships between various health indicators, but conflicts or ambiguities may exist. For example, two "HbA1c" test items with the same name may appear in the same patient's record, but with significantly different values. In such cases, consistency checks are required. Factors such as timestamp order, data source priority, or the reasonableness of the reference interval are used to determine whether to retain, merge, or mark inconsistencies, generating a corrected semantic association graph. The final step is standardized output. The system loads a preset semantic standard template, which defines the standard field naming (e.g., uniformly using "hba1c_value"), data type requirements (floating-point, two decimal places), and coding specifications (using LOINC code 20565-8 to represent glycated hemoglobin measurement) for the final output structure. The corrected graph content is reconstructed according to this template, with all semantic fields adjusted in name, format, and coding system. The final output is structurally consistent, semantically clear, and directly usable for risk model calculations—a semantically unified data set.

[0038] In a specific embodiment, the step of connecting the mutually related labeled semantic fields to construct a semantic association graph includes: Based on the field association conditions in the semantic association rule base, the labeled semantic fields are matched to obtain field association pairs. The field association pairs include the attribution association between patient identifier and diagnosis record, the causal association between diagnosis record and test data, and the temporal association between test data and time stamp. The structure of nodes and edges is transformed on the field association pairs, and the labeled semantic fields are used as graph nodes, and the association relationships in the field association pairs are used as graph edges to obtain the initial semantic graph. Based on preset graph structure rules, connectivity detection and isolated node identification are performed on the initial semantic graph to obtain graph structure detection results. Then, based on the graph structure detection results, isolated nodes in the initial semantic graph are deleted and broken edges are filled to obtain the semantic association graph.

[0039] Specifically, connecting interconnected labeled semantic fields to construct a semantic association graph is a key step in achieving structured integration of multi-source health information. This process implements the technical details of "constructing a semantic association graph" mentioned earlier. After completing the semantic category labeling of the data in a unified format, the system immediately initiates a rule-based association matching mechanism. At this point, a preset semantic association rule base is invoked, which contains several explicit field association conditions used to identify the logical relationships between different semantic fields.

[0040] For example, if a record contains a field labeled "Patient Identifier" (e.g., patient_id) and a field labeled "Diagnosis Record" (e.g., diagnosis_code) within the same data unit, then according to the "same entity attribution" condition defined in the rule base, these two fields are determined to have an attribution relationship, forming a field association pair. Similarly, if a "Diagnosis Record" field is found to contain "I20.9 (Angina pectoris)" and a field labeled "Test Data" has a value of "Colonyx I: 2.1 ng / mL", which exceeds the normal reference range, then according to the medical knowledge settings in the rule base, the two constitute a potential causal association pair. Furthermore, a blood glucose test result and its corresponding timestamp field are also identified as a time-series association pair because of the clear time binding relationship.

[0041] The generation of these association pairs relies on pre-configured matching logic and threshold parameters in the rule base to ensure the clinical rationality of the association judgments. Next, the system enters the graph structure transformation stage. Each labeled semantic field is transformed into a node in the graph. Node attributes retain information such as the original field name, content value, and semantic label. Simultaneously, each identified field association pair is transformed into a directed edge connecting the two nodes. The edge type is labeled with its corresponding association category, such as "belongs to," "triggers," or "occurs at," thus initially forming the semantic graph. Although this graph expresses a basic relationship network, it may have structural defects, such as some nodes not being connected by any edges (e.g., isolated weight measurements not associated with a patient ID), or interrupted association paths (e.g., diagnostic records lacking corresponding timestamps). Therefore, the system further applies pre-defined graph structure rules for connectivity detection and isolated node identification. A traversal algorithm checks the connection status of each subgraph and marks nodes or broken edges that do not meet the minimum connectivity requirements.

[0042] For isolated nodes that are confirmed to have no valid associated paths, they are deleted if they cannot be completed by context. For inferable broken relationships, such as a case where a clinic record is missing a sampling time but other test items in the same batch have timestamps, the nodes are repaired according to the completion strategy of sharing time stamps by default in the same batch of data, thus obtaining a semantic association graph with a complete structure and closed logic.

[0043] In a specific embodiment, the step of performing disease risk analysis on the semantically unified data based on a preset disease risk assessment index system to obtain disease risk results includes: Based on the list of indicator fields in the disease risk assessment indicator system, the semantically unified data is matched and numerically extracted to obtain a set of patient indicator values. Based on the weight coefficients corresponding to each indicator in the disease risk assessment indicator system, the values of each item in the patient indicator value set are weighted and summed to obtain the patient's comprehensive risk score. The patient's comprehensive risk score is determined based on a preset risk level classification threshold to obtain the disease risk result.

[0044] Specifically, firstly, based on the list of indicator fields explicitly listed in the disease risk assessment indicator system, each field in the semantically unified data is compared and matched item by item. For example, for an early warning model of diabetes, this indicator system may contain key field names such as "bmi_value", "hba1c_level", "fasting_glucose", and "family_diabetes_history". The system searches for the existence of corresponding named entries in the input data using exact or fuzzy string matching. Once a match is found, its numerical content is extracted. For example, from a patient's record, bmi_value=26.8, hba1c_level=6.7%, and fasting_glucose=5.9 mmol / L are obtained, and a family history is marked as "yes". This forms a set of structured patient indicator values. This set only retains valid indicator items related to the current assessment goal, and missing values are marked as empty according to rules or handled using a default filling strategy. Next, the weighted calculation process begins. The system calls upon the pre-defined weight coefficients for each indicator within the same indicator system. These coefficients are typically determined based on epidemiological studies or clinical guidelines. For example, BMI is weighted at 0.3, glycated hemoglobin at 0.4, fasting blood glucose at 0.2, and family history is assigned a binary variable of 1 (present) or 0 (absent), with a weight of 0.1. Then, the actual value of each indicator in the patient's numerical set is multiplied by its corresponding weight, and all products are summed to obtain a continuous value between 0 and 1, which is the patient's comprehensive risk score. In this example, the score is calculated as: (26.8 × 0.3) + (6.7 × 0.4) + (5.9 × 0.2) + (1 × 0.1) = 3.84 + 2.68 + 1.18 + 0.1 = 7.8. This score is not a simple average of the original data, but rather a weighted integration result reflecting the relative importance of each risk factor. The final step is risk level determination. The system categorizes the comprehensive score into intervals based on preset risk level thresholds. For example, a score below 5.0 is considered low risk, 5.0 to 7.5 is considered medium risk, and a score above 7.5 is considered high risk. Therefore, the patient with the aforementioned score of 7.8 is classified as high risk, and the corresponding disease risk result is generated.

[0045] The determination process relies on a configurable threshold table, which supports dynamic adjustment of the grading standards according to different population groups or management requirements.

[0046] In a specific embodiment, generating disease warning information based on the disease risk results and sending the disease warning information to relevant personnel includes: The risk level in the disease risk results is compared and determined based on a preset warning trigger threshold to obtain a warning trigger record. Then, the associated fields are extracted from the disease risk results based on the warning trigger record to obtain the warning content field. The disease warning information is obtained by formatting and assembling the fields of the warning content using a preset warning information template, and then by timestamping and prioritizing the disease warning information to obtain the annotated disease warning information. Based on a preset personnel responsibility checklist, the risk levels in the labeled disease warning information are matched with recipients to obtain a target recipient list, and the disease warning information is sent to relevant personnel in the target recipient list through a message push channel.

[0047] Specifically, after obtaining the disease risk result, the system initiates the early warning information generation and distribution process. This process is the concrete implementation of "generating disease early warning information based on the disease risk result and sending the disease early warning information to relevant personnel." First, the risk level is judged according to the preset early warning trigger threshold. For example, if the system is set to trigger an early warning when the risk level is "high," then the system will traverse the risk assessment output of all patients, filter out the entries that meet the conditions, and form early warning trigger records. These records usually contain identifying information such as patient ID, affiliated institution, and original score.

[0048] Subsequently, these trigger records are linked back to the corresponding disease risk result data source, from which key fields are extracted as the basis for early warning. For example, the patient's name, age, main abnormal indicators (such as HbA1c=7.2%), current risk level and assessment time point are retrieved. These extracted information are the early warning content fields, which constitute the core factual basis of the early warning message.

[0049] Next, the system enters the information structuring stage. It loads a pre-defined warning message template, which defines a standard message format, such as: "[High-Risk Warning] Patient {Name} ({Age} years) has a comprehensive diabetes risk score of {score}, with major risk factors: {risk_factors}. Please intervene as soon as possible." The system fills the extracted warning content fields into the template according to placeholder rules, generating the text while also attaching technical metadata, including a current UTC timestamp and automatic priority labels based on risk severity, such as "P0-Urgent" or "P1-High." This generates a labeled disease warning message with complete context and processing guidelines. This step ensures that each message is not only highly readable but also facilitates subsequent system tracking and classification.

[0050] Finally, the push notification is executed. The system retrieves a pre-stored personnel responsibility mapping table, which establishes a mapping relationship between risk levels and responsible parties. For example, it stipulates that a "high-risk" warning must simultaneously notify the attending physician, the head of the family doctor contracting team, and the regional chronic disease management specialist, while a "medium-risk" warning only notifies the family doctor. By querying this mapping table, the system determines the target recipient list for this warning and, based on the contact information of each recipient (such as employee ID, mobile phone number, or WeChat account), delivers the message via SMS gateway, hospital internal communication platform, or regional health cloud message push channel.

[0051] The entire delivery process supports an asynchronous queuing mechanism, temporarily storing data for retry in case of network anomalies to ensure delivery reliability. Taking an elderly patient in a community as an example, who was deemed high-risk due to elevated multiple metabolic indicators, the system automatically generated an alert and pushed it to three responsible individuals. One doctor reviewed the alert and initiated a follow-up process within 15 minutes. This mechanism achieves a closed-loop linkage from risk identification to personnel response, improving the initiative and timeliness of health management.

[0052] Please see Figure 2 , Figure 2 This is a schematic diagram of the framework of an embodiment of the disease early warning system for heterogeneous data integration according to this application. Figure 2 As shown, the heterogeneous data integration disease early warning system includes an extraction module 1, used to extract disease-related data scattered from multiple heterogeneous data sources, and convert the disease-related data into a unified format based on preset data format conversion rules; a mapping module 2, used to perform semantic mapping on the unified format data to obtain semantically unified data, and analyze the semantically unified data based on a preset disease risk assessment index system to obtain disease risk results; and a sending module 3, used to generate disease early warning information based on the disease risk results and send the disease early warning information to relevant personnel.

[0053] Reference Figure 3 This invention also provides a computer device whose internal structure can be as follows: Figure 3 As shown, the computer device includes a processor, memory, display screen, input device, network interface, and database connected via a system bus. The processor provides computing and control capabilities. The memory includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores the operating system, computer programs, and database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database stores the data corresponding to this embodiment. The network interface is used to communicate with external terminals via a network connection. When the computer program is executed by the processor, it implements the above-described method.

[0054] Those skilled in the art will understand that Figure 3 The structures shown are merely block diagrams of some structures related to the present invention and do not constitute a limitation on the computer devices on which the present invention is applied.

[0055] An embodiment of the present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the above-described method. It is understood that the computer-readable storage medium in this embodiment can be a volatile readable storage medium or a non-volatile readable storage medium.

[0056] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the present invention and embodiments can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), Rambus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM, etc.

[0057] In some embodiments, the functions or modules of the apparatus provided in this disclosure can be used to perform the methods described in the above method embodiments. The specific implementation can be referred to the description of the above method embodiments, and for the sake of brevity, it will not be repeated here.

[0058] The description of the various embodiments above tends to emphasize the differences between the various embodiments. The similarities or similarities between them can be referred to, and for the sake of brevity, they will not be repeated here.

[0059] In the several embodiments provided in this application, it should be understood that the disclosed methods and apparatus can be implemented in other ways. For example, the apparatus implementations described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection of devices or units may be electrical, mechanical, or other forms.

[0060] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment, depending on actual needs.

[0061] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0062] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods of various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0063] If the technical solution of this application involves personal information, the product using this technical solution has clearly informed the user of the personal information processing rules and obtained the user's voluntary consent before processing the personal information. If the technical solution of this application involves sensitive personal information, the product using this technical solution has obtained the user's separate consent before processing the sensitive personal information, and also meets the requirement of "express consent". For example, at personal information collection devices such as cameras, clear and prominent signs are set up to inform users that they have entered the scope of personal information collection and that personal information will be collected. If an individual voluntarily enters the collection scope, it is deemed that they have agreed to the collection of their personal information; or on the personal information processing device, with clear signs / information informing users of the personal information processing rules, authorization is obtained from the individual through pop-up information or by asking the individual to upload their personal information; wherein, the personal information processing rules may include information such as the personal information processor, the purpose of personal information processing, the processing method, and the types of personal information processed.

Claims

1. A disease early warning method integrating heterogeneous data, characterized in that, Includes the following steps: Extract disease-related data scattered across multiple heterogeneous data sources, and convert the disease-related data into a unified format based on preset data format conversion rules; Semantic mapping is performed on the data in the unified format to obtain semantically unified data, and disease risk analysis is performed on the semantically unified data based on a preset disease risk assessment index system to obtain disease risk results; Based on the disease risk results, a disease warning message is generated and sent to relevant personnel.

2. The disease early warning method for heterogeneous data integration according to claim 1, characterized in that, The extraction of disease-related data scattered across multiple heterogeneous data sources includes: Based on preset data source configuration parameters, multiple heterogeneous data sources are adapted to obtain a set of data source interfaces; wherein, the set of data source interfaces includes relational database interfaces, document database interfaces, and medical information system interfaces; The data source interface set is used to read data from the multiple heterogeneous data sources to obtain the original dataset. Based on the preset disease data filtering conditions, the original dataset is filtered by fields to obtain the disease-related data, which includes basic patient information, diagnostic records, and test data.

3. The disease early warning method for heterogeneous data integration according to claim 1, characterized in that, The process of converting disease-related data into a unified format based on preset data format conversion rules includes: The disease-related data is parsed according to preset format recognition rules to obtain data format identifiers; Based on the data format identifier, the corresponding field mapping table is matched from the preset data format conversion rules, and the disease-related data is reorganized and encoded using the field mapping table to obtain the data in the unified format.

4. The disease early warning method for heterogeneous data integration according to claim 1, characterized in that, The step of semantically mapping the data in the unified format to obtain semantically unified data includes: The data in the unified format is parsed for field content. The text content in each field of the unified format data is decomposed word by word and matched with semantic features through a preset semantic feature dictionary to obtain field semantic features. Based on the field semantic features, each field is labeled with a semantic category to obtain labeled semantic fields. Logically associate the labeled semantic fields based on the preset semantic association rule base, connect the mutually associated labeled semantic fields to construct a semantic association graph, and perform consistency verification on the semantic fields in the semantic association graph to obtain the corrected semantic association graph. The modified semantic association graph is standardized according to the preset semantic standard template. The semantic fields in the modified semantic association graph are standardized in terms of field name, data type and encoding format according to the format and specification of the standard template to obtain semantically unified data.

5. The disease early warning method for heterogeneous data integration according to claim 4, characterized in that, The step of connecting the mutually related labeled semantic fields to construct a semantic association graph includes: Based on the field association conditions in the semantic association rule base, the labeled semantic fields are matched to obtain field association pairs. The field association pairs include the attribution association between patient identifier and diagnosis record, the causal association between diagnosis record and test data, and the temporal association between test data and time stamp. The structure of nodes and edges is transformed on the field association pairs, and the labeled semantic fields are used as graph nodes, and the association relationships in the field association pairs are used as graph edges to obtain the initial semantic graph. Based on preset graph structure rules, connectivity detection and isolated node identification are performed on the initial semantic graph to obtain graph structure detection results. Then, based on the graph structure detection results, isolated nodes in the initial semantic graph are deleted and broken edges are filled to obtain the semantic association graph.

6. The disease early warning method for heterogeneous data integration according to claim 1, characterized in that, The disease risk analysis is performed on the semantically unified data based on a preset disease risk assessment index system to obtain disease risk results, including: Based on the list of indicator fields in the disease risk assessment indicator system, the semantically unified data is matched and numerically extracted to obtain a set of patient indicator values. Based on the weight coefficients corresponding to each indicator in the disease risk assessment indicator system, the values of each item in the patient indicator value set are weighted and summed to obtain the patient's comprehensive risk score. The patient's comprehensive risk score is determined based on a preset risk level classification threshold, resulting in disease risk outcomes including high-risk and low-risk levels.

7. The disease early warning method for heterogeneous data integration according to claim 1, characterized in that, The process of generating disease warning information based on the disease risk results and sending the disease warning information to relevant personnel includes: Based on the disease risk results of high risk level, an early warning trigger record is generated, and the associated fields are extracted from the disease risk results according to the early warning trigger record to obtain the early warning content field; The disease warning information is obtained by formatting and assembling the fields of the warning content using a preset warning information template, and then by timestamping and prioritizing the disease warning information to obtain the annotated disease warning information. Based on a preset personnel responsibility checklist, the risk levels in the labeled disease warning information are matched with recipients to obtain a target recipient list, and the disease warning information is sent to relevant personnel in the target recipient list through a message push channel.

8. A disease early warning system that integrates heterogeneous data, characterized in that, include: The extraction module is used to extract disease-related data scattered across multiple heterogeneous data sources and convert the disease-related data into a unified format based on preset data format conversion rules. The mapping module is used to perform semantic mapping on the unified format data to obtain semantically unified data, and to analyze the semantically unified data based on a preset disease risk assessment index system to obtain disease risk results. The sending module is used to generate disease warning information based on the disease risk results and send the disease warning information to relevant personnel.

9. A computer device, characterized in that, The method includes a memory and a processor that are coupled to each other, wherein the memory stores program instructions and the processor executes the program instructions to implement the disease early warning method for heterogeneous data integration as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The system stores program instructions that can be executed by a processor, the program instructions being used to implement the disease early warning method for heterogeneous data integration as described in any one of claims 1 to 7.