A health knowledge graph updating method, device, system and storage medium
By constructing a closed-loop process of static foundation-dynamic evolution-proactive intervention, the problem of static construction of health knowledge graphs is solved, enabling real-time processing of multi-source heterogeneous data and personalized health management, thereby improving the real-time nature, accuracy, and management value of health records.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU XINHAO INTELLIGENT TECHNOLOGY CO LTD
- Filing Date
- 2026-02-13
- Publication Date
- 2026-06-12
AI Technical Summary
Existing health knowledge graphs are mostly statically constructed, lacking mechanisms for real-time processing and updating of dynamic data. They are difficult to integrate multi-source heterogeneous data, resulting in insufficient completeness and time-series traceability of health records, and failing to meet the needs of personalized health management.
An initial static knowledge graph is constructed by acquiring multi-source health data, dynamic triples are generated and subjected to temporal processing and multi-source conflict arbitration, the graph is updated using the arbitrated triples, and health intervention programs are triggered when conditions are met. Intelligent information extraction and entity alignment are performed in conjunction with a large language model.
It achieves real-time, accurate, and continuous health knowledge graphs, supports personalized health management, improves the accuracy of health risk assessment and the pertinence of interventions, and enhances the foresight and clinical value of health records.
Smart Images

Figure CN122201813A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of health data management, knowledge graphs, natural language processing, and artificial intelligence, specifically to a method, apparatus, system, and storage medium for updating a health knowledge graph. Background Technology
[0002] With the increasing awareness of health management, user health data is characterized by its multi-source nature (medical institutions, user self-reporting, wearable devices, etc.), heterogeneity (structured tables, unstructured text, semi-structured reports), and timeliness (dynamic indicators changing over time). Traditional health record management methods mostly use relational databases, which struggle to handle complex health data relationships and dynamic changes, resulting in insufficient completeness and time-series traceability of health records, failing to meet the needs of personalized health management.
[0003] Knowledge graph technology can effectively represent the relationships between entities, but existing health knowledge graphs are mostly statically constructed, lacking mechanisms for real-time processing and updating of dynamic data, and are difficult to integrate multi-source heterogeneous data. Therefore, a knowledge graph management method that can realize the static construction and dynamic growth of health records is needed to enhance the utilization value of health data. Summary of the Invention
[0004] This invention provides a method, apparatus, system, and storage medium for updating a health knowledge graph, in order to solve the problems that existing health knowledge graphs are mostly statically constructed, lack a real-time processing and updating mechanism for dynamic data, and are difficult to integrate multi-source heterogeneous data.
[0005] In a first aspect, the present invention provides a method for updating a health knowledge graph, the method comprising:
[0006] Acquire multi-source health data, extract entities to generate static triples containing relationships between entities, and construct an initial static health knowledge graph; When incremental health data is received, dynamic triples are generated. Based on the static triples and dynamic triples, time-series processing and multi-source conflict arbitration are performed, and the initial static health knowledge graph is updated using the arbitration triples. The updated health knowledge graph is used to infer health risks, and when the inference results meet the preset conditions, a health intervention plan is generated and triggered.
[0007] This invention, by constructing a closed-loop process of static foundation-dynamic evolution-proactive intervention, not only integrates multi-source heterogeneous health data to build a standardized and structured knowledge base, but also ensures that the knowledge graph can evolve in real time and accurately according to the user's health status through time-series processing and multi-source conflict arbitration mechanisms, thereby greatly enhancing the real-time performance, accuracy, and continuity of the knowledge graph. Based on this, deep health risk reasoning is performed using the updated knowledge graph, and personalized intervention plans are automatically generated and triggered when conditions are met. Ultimately, this transforms the system from a passive health record record to a proactive, full-life-cycle health management partner, significantly improving the foresight, reliability, and clinical value of health management.
[0008] In one optional implementation, multi-source health data is acquired, entities are extracted to generate static triples containing relationships between entities, and an initial static health knowledge graph is constructed, including: Acquire multi-source health data, and perform data preprocessing based on the structured data, semi-structured data, and unstructured data in the multi-source health data; A large language model is used to extract entities from the preprocessed health data and generate static triples containing relationships between entities. Entities in static triples are categorized, and entity alignment is performed based on the categorized entities using similarity calculation, terminology normalization, and attribute verification to construct an initial static health knowledge graph.
[0009] This invention achieves efficient and automated structured processing of complex health information by comprehensively utilizing multi-source heterogeneous health data (including structured, semi-structured, and unstructured data) and relying on a large language model for intelligent information extraction and triple generation. Specifically, the entity alignment process, which includes similarity calculation, authoritative terminology normalization, and attribute verification, significantly improves the accuracy, consistency, and completeness of the constructed initial static knowledge graph.
[0010] In one optional implementation, a dynamic triple is generated upon receiving incremental health data. Temporal processing and multi-source conflict arbitration are performed based on the static and dynamic triples. The arbitrated triples are then used to update the initial static health knowledge graph, including: When health increment data is received, conflicting data is marked, and dynamic triples are generated using a large language model; Based on the temporal attributes of the relationships between entities in the static and dynamic triples, multiple versions of triples that change over time are generated. Arbitrate the conflicts in the dynamic triples based on the conflict data marked in the dynamic triples; The initial static health knowledge graph is updated using the arbitrated triples.
[0011] This invention significantly enhances the practicality and timeliness of the health knowledge graph by introducing dynamic triple generation, temporal processing, and a multi-source conflict arbitration mechanism. It effectively integrates real-time incremental health data, intelligently generates dynamic triples using a large language model, and constructs multi-version relationship views based on time attributes, thereby systematically addressing data conflicts and inconsistencies. Finally, the arbitration update mechanism continuously optimizes the static knowledge graph, enabling it to dynamically evolve with the user's health status and accurately reflect the latest situation, strongly supporting the accuracy of clinical decision-making and the continuity of health management.
[0012] In one optional implementation, arbitrating conflicts in the dynamic triples based on conflict data marked in the dynamic triples includes: When the labeled conflict data is numerical, a dynamic fusion algorithm based on confidence assessment and multidimensional decay factor is used to fuse the conflict data. When the marked conflict data is a categorized conflict, the conflict data is arbitrated based on the source level of the conflict data and the preset verification method.
[0013] This invention distinguishes between numerical and categorical conflicts and employs a dynamic fusion algorithm based on confidence level and multidimensional decay factors, along with an arbitration mechanism based on source level, to achieve refined and intelligent processing of data conflicts in dynamic triples. This not only effectively solves the inconsistency problem caused by multi-source heterogeneous health data and improves the accuracy and reliability of data in knowledge graphs, but also ensures that conflict arbitration results are closer to real clinical scenarios, providing a more solid and reliable data foundation for subsequent health risk assessment and decision support.
[0014] In one alternative implementation, the initial static health knowledge graph is updated using arbitrated triples, including: The clipped triples are written to the message queue based on stream processing. The triples in the message queue are sharded according to user entity and relation type, and the triples are written in batches to the initial static health knowledge graph based on the sharding results. The MERGE function is used to write the triples during the writing process.
[0015] In this invention, by first writing the arbitrated triples to a message queue, data processing and data writing are decoupled, ensuring system stability and data buffering capacity under high concurrency scenarios. Furthermore, secondary sharding of the triples based on user entities and relation types makes batch write operations more organized, effectively reducing concurrent write conflicts to the same node in the graph database, thereby greatly optimizing write performance and data consistency. Finally, the MERGE function is used during writing to ensure the uniqueness of knowledge, intelligently handling node creation and attribute updates, avoiding data duplication and redundancy, and ultimately guaranteeing the accurate, efficient, and orderly evolution of the knowledge graph content.
[0016] In one optional implementation, an updated health knowledge graph is used to perform health risk reasoning, and a health intervention plan is generated and triggered when the reasoning result meets preset conditions, including: A rule-based reasoning algorithm is used to perform health risk reasoning on the updated health knowledge graph to obtain the first reasoning result. A machine learning algorithm is used to perform health risk reasoning on the updated health knowledge graph to obtain a second reasoning result. Determine whether the first inference result and the second inference result meet the preset conditions, and generate a health intervention plan and trigger push notification if they do.
[0017] In this invention, rule-based reasoning allows for logical inference based on a well-defined medical knowledge base, ensuring transparency and clinical credibility in the reasoning process. Machine learning algorithms are used to uncover implicit patterns and connections within the knowledge graph, enabling deep insights into complex and non-linear health risks. Finally, by synthesizing the results of both inferences and determining that preset conditions are met, the system can automatically generate personalized health intervention plans and trigger push notifications, thereby improving the accuracy, timeliness, and targeted nature of health risk assessments and effectively supporting proactive health management.
[0018] In one optional implementation, the method further includes: Generate a knowledge graph evolution dashboard. The dashboard uses a timeline control to replay the historical changes in entity relationships and generates graphical annotations and statistical reports on conflict data and update status.
[0019] This invention achieves visualized tracking and global insight into the state changes of a health knowledge graph by generating a dynamic evolution dashboard that supports timeline playback. Users can intuitively browse the specific states and evolution paths of entities and relationships at different historical points in time through interactive timeline controls, greatly enhancing the intuitiveness and convenience of tracing health records and understanding changes. Simultaneously, the dashboard graphically highlights conflicting data and automatically generates statistical reports on system update status, clearly revealing potential contradictions and their resolutions during multi-source data integration. This effectively assists system administrators in data quality monitoring and decision verification, significantly enhancing the transparency, interpretability, and management efficiency of the knowledge graph.
[0020] Secondly, the present invention provides a health knowledge graph updating device, the device comprising: The graph generation module is used to acquire multi-source health data, extract entities to generate static triples containing relationships between entities, and construct an initial static health knowledge graph. The graph update module is used to generate dynamic triples when receiving incremental health data, perform time-series processing and multi-source conflict arbitration based on the static triples and dynamic triples, and update the initial static health knowledge graph using the arbitration triples. The risk monitoring module is used to perform health risk reasoning using the updated health knowledge graph, and to generate and trigger a health intervention plan when the reasoning results meet preset conditions.
[0021] Thirdly, the present invention provides a health knowledge graph updating system, the system comprising: The graph generation layer is used to acquire multi-source health data, extract entities to generate static triples containing relationships between entities, and construct an initial static health knowledge graph. A data perception and feature extraction layer is used to generate dynamic triples when incremental health data is received. The decision layer is used to perform time-series processing and multi-source conflict arbitration based on the static triples and dynamic triples, and to update the initial static health knowledge graph using the arbitration triples. The execution feedback layer is used to perform health risk reasoning using the updated health knowledge graph, and to generate and trigger a health intervention plan when the reasoning results meet preset conditions.
[0022] Fourthly, the present invention provides an electronic device, comprising: a memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the computer instructions to perform the health knowledge graph updating method described in the first aspect or any corresponding embodiment thereof.
[0023] Fifthly, the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the health knowledge graph updating method described in the first aspect or any corresponding embodiment thereof.
[0024] In a sixth aspect, the present invention provides a computer program product, including computer instructions for causing a computer to execute the health knowledge graph update method described in the first aspect or any corresponding embodiment thereof. Attached Figure Description
[0025] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0026] Figure 1 This is a schematic diagram of the first process of a health knowledge graph updating method according to an embodiment of the present invention; Figure 2 This is a schematic diagram of a static health knowledge graph according to an embodiment of the present invention; Figure 3 This is a schematic diagram of a second process for updating a health knowledge graph according to an embodiment of the present invention; Figure 4 This is a structural block diagram of a health knowledge graph updating device according to an embodiment of the present invention; Figure 5 This is a flowchart of the data perception and feature extraction layer in the health knowledge graph update system according to an embodiment of the present invention; Figure 6 This is a flowchart of the decision-making layer in a health knowledge graph update system according to an embodiment of the present invention; Figure 7 This is a flowchart of the execution feedback layer in the health knowledge graph update system according to an embodiment of the present invention; Figure 8 This is a complete data flow diagram in the health knowledge graph update system according to an embodiment of the present invention; Figure 9 This is a schematic diagram of the hardware structure of an electronic device according to an embodiment of the present invention. Detailed Implementation
[0027] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0028] It is understood that before using the technical solutions disclosed in the various embodiments of the present invention, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in the present invention and their authorization should be obtained in accordance with relevant laws and regulations through appropriate means.
[0029] The terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.
[0030] According to an embodiment of the present invention, a method for updating a health knowledge graph is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.
[0031] This embodiment provides a method for updating a health knowledge graph. Figure 1 This is a flowchart of a health knowledge graph updating method according to an embodiment of the present invention, such as... Figure 1 As shown, the process includes the following steps: Step S101: Obtain multi-source health data, extract entities to generate static triples containing relationships between entities, and construct an initial static health knowledge graph.
[0032] Specifically, the multi-source monitoring data refers to data that reflects or affects the user's health, obtained from different sources. These different sources may include form filling, medical institution acquisition, wearable device acquisition, and user interaction interface acquisition, etc. This embodiment does not specifically limit the specific source of health data.
[0033] Regarding health data, this embodiment acquired 11 categories of user health information, specifically including basic information, physical examination information, medical history, follow-up plans, medication information, lifestyle habits, behavioral information, intervention information, dynamic data, frequent symptoms, and health needs. Specifically, basic information includes identity identification, demographic characteristics, and basic physiological data. Identity identification uses either a national ID number or a user ID. If a national ID number is used, it is stored in encrypted form, such as using SHA-256 encryption, to prevent information leakage. Demographic characteristics include age, gender, and ethnicity; basic physiological data includes height, weight, blood type, and genetic history. Physical examination information includes basic examinations, reports from medical institutions, and reports from non-medical institutions. Basic examinations include 23 core indicators such as systolic / diastolic blood pressure (mmHg), BMI (kg / m²), and heart rate (beats / minute); medical institution reports include detailed examination items (such as "complete blood count" including sub-items such as white blood cell count and hematocrit), abnormal indicator markers (such as "↑" indicating values higher than normal), and physician diagnostic recommendations (text description); non-medical institution reports include records of cosmetic procedures (such as "hyaluronic acid injection") and allergen test results (such as "dust mite allergy").
[0034] The medical history includes past medical history and present medical history. Past medical history includes the name of the disease (e.g., "diabetes"), the date of diagnosis (accurate to the year, month, and day), and the treatment outcome (e.g., "clinical cure"). The present medical history includes the onset time of symptoms, their progression (e.g., "cough for 3 days, worsening with fever for 1 day"), and the treatment process (e.g., "ineffective use of cephalosporin antibiotics"). The follow-up plan includes the follow-up subject, a list of items, time intervals, and an execution deadline. The follow-up subject includes the name of the hospital or family doctor; the list of items includes specific follow-up examinations, such as "fasting blood glucose monitoring" and "wound dressing changes"; the time intervals include daily, weekly, or monthly. Medication information includes the generic name of the drug (e.g., "aspirin"), brand name (e.g., "bay aspirin"), strength (e.g., "100mg / tablet"), dosage (e.g., "1 tablet / time"), frequency (e.g., "once daily"), route of administration (e.g., "oral"), prescribing institution (e.g., "XX tertiary hospital"), start and end dates of medication use, and reasons for discontinuation (e.g., "adverse reactions").
[0035] Lifestyle habits include diet, smoking and alcohol consumption, exercise, sleep, and emotions. Diet includes dietary structure (ratio of meat to vegetables, e.g., "3:7") and taste preferences (saltiness, e.g., "low salt"). Smoking and alcohol consumption includes years of smoking (e.g., "5 years"), average daily cigarette consumption (e.g., "10 cigarettes"), type of alcohol consumption (e.g., "beer"), and frequency (e.g., "twice a week"). Exercise includes type of exercise (e.g., "running") and duration of each session (e.g., "30 minutes"). Sleep includes bedtime (e.g., "11 PM"), sleep duration (e.g., "6 hours"), and sleep quality score (1-10). Emotions include weekly frequency of mood swings (e.g., "3 times") and primary triggers (e.g., "work stress"). Behavioral information includes dietary and exercise records. Dietary records include the ingredients for three meals a day (e.g., “Breakfast: 250ml milk, 50g bread”) and the amount consumed (accurate to grams or milliliters); exercise records include the type of exercise per session (e.g., “swimming”), duration (e.g., “45 minutes”) and intensity (MET value (metabolic equivalent), e.g., “8.0”).
[0036] Intervention information includes medical packages and health advice. Medical packages include the purchase date, service content (e.g., "annual physical examination package includes CT scan") and validity period (e.g., "2025-01-01 to 2025-12-31"); health advice includes the type of advice (diet / exercise / medication), specific content (e.g., "walk 8000 steps daily") and implementation requirements (e.g., "implement continuously for 3 months"). Dynamic data includes each update record containing the original value, the updated value, the update time (accurate to the second), the reason for the update (such as "user correction" or "device synchronization"), and the operator (such as "system" indicating automatic system update); frequent symptoms include the symptom name (such as "headache"), frequency of occurrence (daily / weekly), duration (such as "2 hours / time"), accompanying symptoms (such as "nausea"), and relief methods (such as "relief after rest"); health needs include the target organ / system (such as "liver" or "respiratory system"), improvement goals (such as "reducing transaminase to below 40 U / L" or "increasing lung capacity to above 3000 ml"), and the expected time to achieve them (such as "within 6 months").
[0037] The data structure for the above 11 types of health information is shown in Table 1 below: Table 1 Data Structure
[0038] When acquiring 11 types of health information, for basic information, identity identifiers, demographic characteristics, and basic physiological data are obtained through the user registration process. Format validation is triggered upon form submission to ensure that age is an integer and blood type conforms to the ABO blood type system specifications. The collection of physical examination information requires integration with the HL7FHIR interface of medical institutions. Basic examination indicators, detailed items from medical institution reports, and anomaly markers are synchronized hourly via OAuth 2.0 authorization. Simultaneously, non-standard physical examination data uploaded by medical aesthetic institutions through a dedicated interface is received. All data is transmitted to the corresponding Apache Kafka topic via TLS 1.3 encryption. Medical history information is collected through structured questionnaires. Past medical history records disease names, diagnosis times, and treatment outcomes. Current medical history requires detailed collection of symptom onset times, progression, and treatment processes. Questionnaire fields are linked to a medical terminology database for automatic completion. Follow-up plan data is synchronized periodically through the hospital information system, including the follow-up subject, item list, time intervals, and deadlines. The system automatically converts time intervals into specific execution times. Medication information is obtained from medical institution prescription systems, covering all elements such as generic name, brand name, specifications, and dosage of drugs. Simultaneously, the mapping relationship between drug names and ATC codes (Anatomical Therapeutic Chemical System) is verified to ensure standardized terminology. Lifestyle and behavioral information is collected through the user interface. Lifestyle habits are entered in modules categorized by diet, tobacco and alcohol, and exercise. Behavioral information can be manually entered or obtained through integration with diet and exercise apps, accurate to the ingredients and intake of three meals a day, and the type and intensity (MET value) of a single exercise session. Intervention information includes medical packages purchased by the user and health recommendations generated by the system. Dynamic data is captured in real time by monitoring update operations from various data sources. Each record must include the original value, updated value, update time, reason, and operator to ensure traceability of changes. Frequent symptoms and health needs are collected through user-filled forms. Frequent symptoms require recording the frequency of attacks, duration, and relief methods. Health needs must specify the target organ, improvement indicators, and expected achievement time. After form submission, the form is automatically associated with the corresponding entity category.
[0039] After acquiring multi-source health data, entities are extracted from the data. These extracted entities include subject entities and object entities. The subject entity represents the identity identifier from the basic information, while demographic characteristics and physiological data from the basic information can serve as attribute information for the identity identifier. Information other than the basic information can be considered as object entities. Triples can be formed based on the extracted subject and object entities and the relationships between them. For example, user A (subject entity) taking (relationship) medication B (object entity) is a triple. Associating these triples from the multi-source health data forms an initial static health knowledge graph. For example, the triples "user A takes medication B" and "user A has diabetes" can be associated through user A.
[0040] Step S102: When health increment data is received, a dynamic triplet is generated. Based on the static triplet and the dynamic triplet, time-series processing and multi-source conflict arbitration are performed, and the initial static health knowledge graph is updated using the arbitration triplet.
[0041] Specifically, incremental health data can be obtained by connecting with Hospital Information Systems (HIS) and Laboratory Information Management Systems (LIMS) through open API interfaces of medical institutions. A data synchronization channel is established based on the patient's unique identifier (ID) to obtain physical examination reports and test results in real time. A structured user interface can also be designed to support the input of lifestyle data. The interface has built-in data validation rules, and the input data is encapsulated in JSON format and includes required fields such as "User ID," "Input Time," "Data Item," and "Value." Data interfaces with wearable device manufacturers are also connected to aggregate continuously collected time-series data in 5-minute time windows to generate batch data. Each batch includes information such as timestamp, data type, and aggregate value. The acquired incremental data is stored in partitions by patient ID and a distributed database is used to ensure high-concurrency write performance. Specifically, the incremental health data can be acquired at preset intervals to achieve continuous updates to the health knowledge graph. It should be noted that the first update is based on the initial static health knowledge graph, while the second update is based on the updated health knowledge graph, thus achieving dynamic growth of the knowledge graph.
[0042] After acquiring incremental health data, dynamic triples are generated in a manner similar to generating static triples. Simultaneously, time-series modeling is performed by incorporating the temporal attribute dimension from the static triples. This involves managing the relationships between static and dynamic triples over time, such as adding relationships not present in the static triples, updating relationships present in both static and dynamic triples, and marking relationships present in the static triples but absent in the dynamic triples as invalid. Furthermore, conflicts within the triples need to be arbitrated. Finally, the time-series processed and conflict-arbitrated triples are written to update the health knowledge graph.
[0043] Step S103 involves using the updated health knowledge graph to perform health risk reasoning, and generating and triggering a health intervention plan when the reasoning result meets preset conditions. Specifically, a dynamically growing health knowledge graph can be used for health risk reasoning. For example, algorithms such as machine learning algorithms can be used to process the health knowledge graph to determine if any user has a health risk. If a health risk exists, a corresponding health intervention plan is generated based on the specific health risk and pushed to the user or doctor. This achieves a closed loop from knowledge graph to clinical intervention.
[0044] This embodiment provides a method for updating a health knowledge graph, which includes the following steps: Step S201: Obtain multi-source health data, extract entities to generate static triples containing relationships between entities, and construct an initial static health knowledge graph.
[0045] Specifically, step S201 includes: Step S2011: Obtain multi-source health data and perform data preprocessing based on the structured, semi-structured, and unstructured data within the multi-source health data. Specifically, after obtaining the multi-source health data and before extracting entities, the data is preprocessed based on its data type. For structured data, the format integrity is directly verified; for semi-structured data, text recognition such as OCR (Optical Character Recognition) is used to convert it into text, and then regular expressions are used to extract key information; unstructured data is converted into structured tags through natural language processing.
[0046] Step S2012 involves using a large language model to extract entities from the preprocessed health data and generating static triples containing relationships between entities. Specifically, for the preprocessed data, this embodiment uses a large language model to generate entities and triples. The large language model is fine-tuned and trained using a health-domain labeled corpus. The preprocessed data is input into the fine-tuned large language model, which generates an entity set. The model then establishes relationships between entities from the entity set, generating (entity, relationship, entity) triples.
[0047] Specifically, for preprocessed structured data, the model directly maps it to 11 types of entity nodes as the main body, and triples of (main entity node, relationship, and object entity node). For semi-structured and unstructured data, the model extracts entities through named entity recognition, determines the relationships between entities through relation extraction, and outputs entities containing id, name, category, and properties, as well as complete triples of subject, predicate, object, timestamp, source, and category. For example, the process of generating entities is as follows: calling the fine-tuned Qwen and Llama models, inputting structured or semi-structured JSON, tabular, or text data, and controlling the output through prompt to form an entity structure of {"id":,"name":,"category":,"properties":{}}. The process of generating triples is as follows: call the fine-tuned Qwen and Llama models, input the entity structure, and output triples in the format {"subject":S,"predicate":P,"object":O,"timestamp":T,"source":Y,"category":C}, where S is the subject entity node of health data, P is the relationship between the subject node and the object node, O is the object entity node information, T is the timestamp, Y is the data source, and C is the data category.
[0048] In addition, after generating the triples, the triples undergo enhancement processing. During this process, field normalization maps terms to a standard library (for example, the entity generated for "cold" might include "upper respiratory tract infection," "cold," etc.; normalization categorizes these all under the word "cold"). Time correction converts ambiguous times to precise dates, while invalid dates are corrected and marked as corrected records. Furthermore, a conflict pre-detection step marks multiple values for the same indicator and the same timestamp as pending verification, triggering a data source resampling mechanism.
[0049] Step S2013: Classify the entities in the static triples, and perform entity alignment based on the classified entities using similarity calculation, terminology normalization, and attribute verification to construct an initial static health knowledge graph.
[0050] Specifically, such as Figure 2 As shown, in this embodiment, the 11 entities in the triplet are categorized into five main classes: user entities, health indicator entities, disease entities, medical behavior entities, and subjective information entities. Each entity has a unique identifier and a set of core attributes. Specifically, the unique identifier for a user entity is `user_id`, with core attributes: `identity_info`, `demographic`, `physiology_basic`, and extended attributes: `contact_info`, `insurance_info`; the unique identifier for a health indicator entity is `indicator_id`, with core attributes: `standard_name`, `normal_range`, `unit`, `measurement_method`; the unique identifier for a disease entity is `disease_id`, with core attributes: `name`, `classification`, `pathogenesis`, `complication`; the unique identifier for a medical behavior entity is `behavior_id`, with core attributes: `behavior_type`, `executor`, `timestamp`; and the unique identifier for a subjective information entity is `subjective_id`, with core attributes: `description`, `frequency`, `timestamp`. During classification, each entity can be assigned to one of the five categories based on its corresponding attributes. For example, Zhang San is a user, and hypertension is a disease.
[0051] In this embodiment, among the five categorized entities, three entity alignment steps—similarity calculation, terminology normalization, and attribute verification—are used to determine whether they belong to the same entity. For similarity calculation, multiple entities in each category are first converted into vector form (e.g., using BGE-large or SimCSE models to generate vectors), and then the similarity between vectors is calculated. A similarity threshold (which can be set to 0.85 ± 0.05) is used to determine if they belong to the same entity. Terminology normalization maps multiple entities to a standard, structured medical terminology database (mapped to SNOMED CT or ICD-10 encoding) to determine if any two entities belong to the same entity. Attribute verification checks whether the attribute information of any two entities is consistent (e.g., verifying whether the core attribute conflict rate is <10%). If they are consistent, they are considered the same entity. Based on these determinations, when all three entity alignment steps indicate that they belong to the same entity, the corresponding entities in each category are classified as the same entity.
[0052] After determining that they belong to the same entity, attribute information from different data sources needs to be merged. Specifically, multi-source attributes are merged according to the trust weight of the data source; that is, each data source (such as a top-tier hospital HIS, a community hospital system, or a wearable device) is assigned a preset trust weight. For the same attribute, the value provided by the data source with the highest weight is retained as the primary attribute. Values from data sources with lower weights are stored as supplementary attributes with their source clearly indicated. If two triples are in conflict (e.g., A says "taken medication A," B says "did not take medication A"), it is impossible to automatically determine which fact is correct. In this case, the conflicting relationship is marked for manual review, and the sources of both parties are recorded for final determination by medical experts.
[0053] The initial static health knowledge graph was constructed and stored in the Neo4j graph database. Five types of entities were used as node labels, and the relationships within triples were used as relation types. Unique indexes were created for user_id, disease_id, and timestamp, and full-text indexes were created for entity names. Subsequently, a validation report was generated for the stored health knowledge graph, including entity coverage (calculating the proportion of entities successfully extracted and added to the database relative to the theoretically required number of entities), alignment accuracy (randomly sampling and checking aligned entities to determine if they were correctly merged), and relation integrity (calculating the average number of relations connected to important user, disease, and other nodes), ensuring that the metrics met (coverage ≥ 95%, accuracy ≥ 98%, average number of relations for core entities ≥ 5).
[0054] Step S202: When health increment data is received, a dynamic triplet is generated. Based on the static triplet and the dynamic triplet, time-series processing and multi-source conflict arbitration are performed, and the initial static health knowledge graph is updated using the arbitration triplet.
[0055] Specifically, step S202 includes: Step S2021: Upon receiving incremental health data, conflicting data is marked, and dynamic triples are generated using a large language model. Specifically, after receiving incremental health data, a preprocessing process is performed, including format validation, missing data completion, conflict detection, and unit standardization. Format validation uses a JSON Schema validation tool to verify the integrity of data fields. Data lacking key fields such as "Patient ID" and "Test Time" is marked as "to be completed," triggering a completion reminder. Conflict detection uses a data comparison engine to identify contradictory values for the same indicator and marks conflicting data and their source. Unit standardization establishes a medical unit conversion rule base to uniformly convert non-standard units into internationally accepted units.
[0056] When generating triples, a large language model fine-tuned with medical domain corpus is used and deployed on a GPU server. Entity recognition and relation extraction services are provided through a RESTful API, supporting batch data processing. The model annotates preprocessed text data with entities, identifies new disease entities such as "obstructive sleep apnea-hypopnea syndrome," and matches them with the medical ontology database. If an entity is not included, a unique identifier is automatically assigned and added to the entity database. For structured data and unstructured text, the model extracts the relationships between entities. For example, it extracts triples (User A, taking, aspirin) from "User today took 100mg of aspirin" and (User B, allergic, penicillin) from "Patient is allergic to penicillin." Trend descriptions are processed by combining a rule engine with the model. For example, it generates (User C's fasting blood glucose level, trend, rising) from "Fasting blood glucose level increased by 1.2 mmol / L compared to last week" and (User C's weight, trend, falling) from "Weight decreased by 3kg compared to last month," and associates timestamp information to mark the period when the trend occurred.
[0057] Step S2022: Based on the time attributes of the relationships between entities in the static and dynamic triples, generate multiple versions of triples that change over time. Specifically, to achieve time management, this embodiment sets a four-dimensional triple structure (subject) - [predicate{value,source,valid_from,valid_to}] -> (object), where the valid_from field records the start time of the relationship's effectiveness, and the valid_to field records the end time of the relationship's expiration. That is, when a relationship is updated, the system automatically sets the valid_to of the old version to the current update time and creates a new version record while setting its valid_from to the update time. When a relationship expires, the valid_to is set to the expiration time, but the historical record is not deleted to ensure the traceability of the data throughout its entire lifecycle.
[0058] For example, when a user stops taking a certain medication, the time attribute update process is automatically triggered, setting the valid_to of the original "taking" relationship to the medication discontinuation timestamp, and simultaneously creating a "taking" relationship with a new time window for the new medication. When a user's diagnosis result changes, in addition to creating a new "confirmed" relationship and marking valid_from as the diagnosis date, the valid_to of the original "suspected" relationship is also completed, forming a temporally continuous and non-overlapping relationship chain. This is similar to performing time-series-based structured processing on multi-source feature distributions, ensuring that data from different time points are connected in an orderly manner.
[0059] In addition, based on time attribute management, a multi-version coexistence mechanism can be adopted to establish a timeline version chain for users' key health indicators. The same "has_bmi (relational predicate)" relationship can store multiple versions according to the detection date, and supports querying historical data for any time period by timestamp range.
[0060] Step S2023: Arbitrate the conflicts in the dynamic triples based on the conflict data marked in the dynamic triples; wherein, in the conflict arbitration, this embodiment adopts different arbitration methods for different types of data.
[0061] Specifically, step S2023 above includes: Step a1: When the marked conflict data is numerical conflict, a dynamic fusion algorithm based on confidence assessment and multidimensional decay factor is used to fuse the conflict data.
[0062] Step a2: When the marked conflict data is a type of conflict, the conflict data is arbitrated based on the source level of the conflict data and the preset method.
[0063] Specifically, to achieve conflict arbitration, a hierarchical conflict resolution engine is built to arbitrate numerical conflicts and categorical conflicts separately. For numerical conflicts, a dynamic fusion algorithm combining confidence assessment and multidimensional decay factors is used. The fusion process is implemented using the following formula:
[0064] In the formula, Trust weight for data sources; The time decay factor is given by the formula: This is used to reduce the impact of outdated data on the current state. This is the outlier penalty coefficient for the indicator. When a data point deviates from the historical benchmark value by ±3SD (standard deviation), its confidence weight is reduced. The original values of the indicators provided for each data source. For the trust weight of the data source, the weight of medical institution data source is assigned 0.9 (based on the accuracy and professionalism of its testing equipment), the weight of user-uploaded data is assigned 0.7 (considering the possible operational errors), and the weight of community health service center data is assigned 0.8 (between the former two); the above weight assignments can also be adjusted based on the actual situation.
[0065] Categorical conflicts refer to multi-source inconsistencies arising from discrete or qualitative health attribute data in the data map. This type of data typically presents as mutually exclusive category labels, state descriptions, or grade determinations, making weighted averaging impossible as with continuous numerical values. Categorical categories include qualitative attribute values for patient health indicator entities, disease entities, and medical behavior entities. Specifically, this includes, but is not limited to: 1. Disease diagnosis conclusions: e.g., source A diagnoses "viral pneumonia," source B diagnoses "bacterial pneumonia"; 2. Grading indicators: e.g., cardiac function grading (Grade I vs. Grade II), hypertension risk stratification (high risk vs. very high risk); 3. Qualitative test results: e.g., antibody test results (positive vs. negative). For categorical conflicts, a three-tiered review process is triggered: first, the source grades of the conflicting data are compared. If the source grades are consistent, the data is pushed to a regional medical expert database for blind review by three or more experts from relevant departments. The majority opinion is used as the final conclusion. If expert opinions differ, a multi-center consultation is initiated. This approach, similar to the priority-based and manual intervention method used for feature distribution conflicts, ensures the scientific rigor of conflict resolution.
[0066] Step S2024: Update the initial static health knowledge graph using the arbitrated triples.
[0067] Specifically, step S2024 above includes: Step a1: Write the arbitrated triples to the message queue based on stream processing; specifically, when updating the health knowledge graph using arbitrated triples, this embodiment constructs a distributed stream processing pipeline based on Flink to write the triple data arbitrated by the decision layer to the Kafka message queue in JSON format.
[0068] Step a2 involves sharding the triples in the message queue according to the user entity and relation type, and then writing the triples in batches into the initial static health knowledge graph based on the sharding results. The MERGE function is used to write the triples during the writing process.
[0069] Specifically, when acquiring incremental health data, a data synchronization channel is established based on the patient's unique identifier (ID), achieving physical partitioning (first-level sharding). However, during batch high-concurrency writes to Neo4j, without further subdivision, it is highly susceptible to triggering database row locks or deadlocks due to multiple data operations on the same node simultaneously (such as uploading heart rates multiple times in a short period). Therefore, after pulling data from the message queue, logical aggregation is performed based on "patient ID + relation type" for secondary sharding. This ensures that continuous changes to the same attribute of the same entity are encapsulated within the same database transaction, thereby minimizing lock contention and improving throughput.
[0070] Meanwhile, when writing triples into the corresponding node group in the Neo4j graph database, a MERGE (matching means merging, creating if it doesn't exist) mechanism is used. The specific process is as follows: first, the existing user node (Anchor Node) in the graph is anchored (Matched) based on the "patient ID" in the triple; then, an edge pointing to the new data node is constructed from this starting point. Therefore, when the new triple is written to the database, it is physically connected to the original knowledge graph network through a shared entity unique identifier (ID), achieving automatic fusion and structural updates of old and new knowledge without requiring additional association steps after writing.
[0071] In addition, this embodiment also establishes a full-link fault tolerance mechanism. Before writing triples, a snapshot checkpoint (including the current number of nodes, the total number of relations, and the hash value of key attributes) is created. If the writing process is interrupted, data rollback can be performed based on the most recent checkpoint. A full graph verification is performed daily: all nodes are traversed through a depth-first search to clean up isolated nodes without associated edges; overlapping periods are marked and automatically corrected through a time window overlap detection algorithm; the database index is optimized and rebuilt to improve the response speed of subsequent queries and inferences. Similar to performing consistency verification on equivalent feature distributions to ensure the reliability of model input, the structural integrity and data consistency of the graph are ensured during dynamic updates.
[0072] Step S203: Use the updated health knowledge graph to perform health risk reasoning, and generate and trigger a health intervention plan when the reasoning result meets the preset conditions.
[0073] Specifically, step S203 includes: Step S2031: A rule-based reasoning algorithm is used to perform health risk reasoning on the updated health knowledge graph to obtain a first reasoning result. Specifically, when using rule-based reasoning, the associations between triples in the knowledge graph are combined to perform health risk reasoning. For example, (User A, Stroke Risk, High Risk) is automatically derived from (User A, Disease, Hypertension) and (Hypertension, Complications, Stroke), and the confidence level of User A's health risk is determined by combining attributes such as user age and medical history.
[0074] Step S2032: Use machine learning algorithms to perform health risk reasoning on the updated health knowledge graph to obtain a second reasoning result; specifically, user health time series data can be used to train machine learning models such as graph neural networks, and then the time series features in the health knowledge graph can be extracted and input into the trained model to predict the trend of indicator changes in the next 3 months.
[0075] Step S2033: Determine whether the first inference result and the second inference result meet preset conditions, and generate a health intervention plan and trigger push if they do. Specifically, when the first inference result, such as user A's health risk confidence level, exceeds a threshold, a corresponding health intervention plan is generated and pushed to the user's or doctor's terminal; when the second inference result, such as the indicator change trend, meets preset conditions, a corresponding health intervention plan can also be generated and pushed to the user's or doctor's terminal. For example, when it is detected that the user's fasting blood glucose is >7.0 mmol / L for 3 consecutive times, an intervention plan node {id:"DM_Intervene01",type:"diabetes intervention plan",content:"low-carb diet + 30 minutes of exercise per day"} is automatically created, and the relationship (user A, urgently needed, DM_Intervene01) is generated. At the same time, it is pushed to the user's attending physician's terminal through the hospital information system interface, realizing a closed loop from knowledge graph to clinical intervention.
[0076] Step S204 generates a knowledge graph evolution dashboard. The dashboard uses a timeline control to replay the historical changes in entity relationships and graphically annotates conflicting data and update status, generating statistical reports. Specifically, a real-time monitoring and interactive interface for graph evolution can be built. A WebGL-based graph evolution dashboard can be developed, supporting the playback of graph changes in user health data by time dimension (day / week / month). Users can drag and drop the timeline control to view the increase and decrease of entity relationships within a specific time period, such as continuously observing the correlation changes between the "blood pressure value" node and the "medication" relationship over 6 months. Conflicting data is color-coded, with red nodes representing numerical conflict data to be reviewed and yellow nodes representing categorical conflict data to be confirmed. A floating window displays the source of the conflict and handling suggestions. A daily health knowledge update report is generated, including the number of new entities, the frequency of relationship updates, and high-risk warning statistics. The trend of health indicator changes for different user groups is presented in the form of a heatmap, similar to visualizing the distribution of equivalent features to intuitively present data characteristics.
[0077] As one or more specific application embodiments of the present invention, such as Figure 3 As shown, the health knowledge graph update method adopts the following process: First, static graph construction is performed: multi-source health data is collected, and structured entities and relationships are extracted using large model triple generation technology to complete the static graph construction; then, the dynamic growth stage is entered, incremental data is continuously added and dynamic triples are generated in real time, the time attributes of the data are processed through time series modeling, and the conflict graph update strategy is used to solve the inconsistency problem of multi-source data; the processed data is updated to the graph database through asynchronous writing and reconstruction mechanism; finally, the graph status is displayed in real time through dynamic monitoring and visualization modules, and health risk prediction and intelligent intervention are realized based on dynamic reasoning and application.
[0078] This embodiment also provides a health knowledge graph updating device, which is used to implement the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can be a combination of software and / or hardware that implements a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.
[0079] This embodiment provides a health knowledge graph updating device, such as... Figure 4 As shown, it includes: The graph generation module 41 is used to acquire multi-source health data, extract entities to generate static triples containing relationships between entities, and construct an initial static health knowledge graph. The graph update module 42 is used to generate dynamic triples when receiving incremental health data, perform time-series processing and multi-source conflict arbitration based on the static triples and dynamic triples, and update the initial static health knowledge graph using the arbitrated triples. The risk monitoring module 43 is used to perform health risk reasoning using the updated health knowledge graph, and to generate and trigger a health intervention plan when the reasoning result meets the preset conditions.
[0080] The health knowledge graph updating device provided in this embodiment of the invention can execute the health knowledge graph updating method provided in any embodiment of the invention, and has the corresponding functional modules and beneficial effects for executing the method. Further functional descriptions of the above modules and units are the same as in the corresponding embodiments described above, and will not be repeated here.
[0081] This embodiment also provides a health knowledge graph update system, which includes: The graph generation layer is used to acquire multi-source health data, extract entities to generate static triples containing relationships between entities, and construct an initial static health knowledge graph. A data perception and feature extraction layer is used to generate dynamic triples when incremental health data is received. The decision layer is used to perform time-series processing and multi-source conflict arbitration based on the static triples and dynamic triples, and to update the initial static health knowledge graph using the arbitration triples. The execution feedback layer is used to perform health risk reasoning using the updated health knowledge graph, and to generate and trigger a health intervention plan when the reasoning results meet preset conditions.
[0082] In one alternative implementation, such as Figure 5As shown, the data perception and feature extraction layer completes data access and semantic transformation through the incremental data access and processing module and the dynamic triple generation module. The incremental data access and processing module is responsible for the access, cleaning and standardization of health data from multiple channels (such as medical institution APIs, user input and wearable devices, etc.). It achieves real-time acquisition of incremental data by adapting to different data source interfaces and synchronizes the data in real time to form structured JSON. Then, through the preprocessing pipeline, it outputs structured data through preprocessing steps such as OCR format verification, conflict detection and unit unification. The dynamic triple generation module based on the large model uses fine-tuning of the large model to perform entity recognition and relation extraction on the preprocessed data to generate basic triples. It also constructs semantic trend triples for trend data to provide incremental knowledge units (i.e., dynamic triples) for the knowledge graph.
[0083] In one alternative implementation, such as Figure 6 As shown, the decision layer includes a temporal modeling module, a conflict graph update strategy module, and an asynchronous writing and graph reconstruction module, addressing the issues of temporal management, conflict resolution, and secure updates in the dynamic updating of knowledge graphs. Specifically, the temporal modeling module constructs a triple structure with time attributes and manages the temporal validity of knowledge through dynamic relation adjustment rules. Specifically, it parses the timestamps in the triples to obtain their temporal semantics, and then queries existing relations in the graph to understand the current state of the relations. If it is a newly added relation, it sets `valid_from` to mark the starting point of the relation in the time dimension; if it is an updated relation, it adopts a strategy of creating a new version and setting the `valid_to` of the old version, marking the old relation as invalid, and assigning a new time identifier to the new relation; if it is an invalid relation, it directly sets `valid_to` to the current time, precisely terminating the validity period of the relation. This achieves ordered updates of data in different time dimensions. The conflict graph update strategy module employs a multi-version coexistence mechanism to support historical data tracing. A conflict resolution engine handles multi-source data conflicts, ensuring the consistency of information within the knowledge graph. Specifically, it identifies potential contradictions by comparing multi-source data with the same timestamp. For numerical data, the system executes a weighted average algorithm, intelligently fusing data based on the reliability weights of the data sources to arrive at the most likely accurate value. For categorized data, the system triggers a manual review queue, submitting conflicting data to professionals for adjudication.
[0084] The asynchronous write and graph reconstruction module implements asynchronous writing of triples through a stream processing pipeline, combined with checkpoint and full verification mechanisms to ensure the efficiency and security of graph updates. Specifically, asynchronous writing begins with creating a checkpoint snapshot to provide security for possible rollback operations; then, data is sharded by patient ID to organize it by patient dimension to improve write efficiency; next, it is written to the graph database in batches. After the write is completed, the system verifies the write results to ensure that the data is accurately stored in the database; finally, a success log is updated to record the complete operation history.
[0085] In one optional implementation, the execution feedback layer includes a dynamic monitoring and visualization feedback module and a dynamic reasoning and application module, responsible for the visualization monitoring and intelligent application triggering of the knowledge graph. The dynamic monitoring and visualization feedback module provides a visual dashboard of the graph's evolution, supports timeline backtracking and anomaly data annotation, and generates health knowledge update reports. The dynamic reasoning and application module performs health risk reasoning based on the knowledge graph and automatically generates personalized health intervention plans through a threshold triggering mechanism, realizing the transformation of knowledge into clinical applications.
[0086] Specifically, such as Figure 7 As shown, this execution feedback layer can be divided into a monitoring model, an inference engine, and a threshold triggering system. In the monitoring module, the system captures graph changes in real time, continuously monitoring the dynamic updates of entities and relationships within the knowledge graph to ensure that any changes in health status are immediately detected. Upon capturing a change, the system updates the visualization dashboard, supporting timeline playback for easy tracing of historical changes. Simultaneously, the system generates a change log, recording in detail the content, time, and source of each graph change, providing a reliable data foundation for subsequent auditing, analysis, and problem tracing.
[0087] The inference engine employs a two-layer architecture combining rule-based reasoning and machine learning reasoning. In the rule-based reasoning layer, the system first traverses the association paths, exploring the connections between entities in the knowledge graph. For example, starting from the "user" node, it passes through the "having a disease" relationship to find the "complications" node. Next, the system matches the discovered association paths with a risk rule base, comparing them with pre-defined medical risk rules (such as "patients with hypertension and hyperlipidemia have a high risk of stroke"). Upon successful matching, the system calculates the confidence level, quantitatively assessing the credibility of the risk conclusion based on the clarity of the rule, the completeness of the evidence chain, and the reliability of the data source.
[0088] In the machine learning inference layer, the system focuses on temporal and deep pattern mining. It first extracts temporal features, capturing trends, periodicity, and volatility from the user's historical health indicator sequences. Then, it uses a graph neural network (GNN) model for prediction. This model simultaneously utilizes the structured association information (nodes and relationships) of the graph and the extracted temporal features for deeper pattern learning and inference. Finally, this layer outputs trend predictions, such as predicting the future trend of a health indicator or the probability of developing a certain disease.
[0089] In the threshold-triggered system, the system first determines whether a threshold has been exceeded by comparing the calculated risk value, predicted value, or real-time monitoring value with pre-set clinical safety and risk thresholds. If the threshold is exceeded, the system immediately initiates a series of intervention actions: First, an intervention node is created by adding an entity node representing a "health intervention plan" to the knowledge graph; then, an intervention plan is generated, automatically producing personalized suggestions based on user profiles, risk types, and best clinical practices, such as adjusting medication, recommending follow-up examinations, and lifestyle interventions; next, relationships are established by creating "urgently needed" relationships between user nodes and intervention plan nodes in the graph, solidifying this decision; finally, the plan is pushed to the doctor's end via a message interface, sending the complete intervention plan and risk basis to the attending physician or health manager to assist in clinical decision-making. If the threshold is not exceeded, the system will not trigger intervention, but the instruction flow will return to monitoring, continuing routine monitoring of the user's status. Regardless of whether intervention is triggered, the system will record a trigger log, fully saving the inputs, outputs, decision results, and timestamps of this threshold judgment, ensuring that all automated decision-making processes are auditable and reproducible.
[0090] In one alternative implementation, in addition to the map generation layer, the system can also be divided into a data input module, a data perception and feature extraction module, a decision-making module, a data storage module, and an execution feedback module.
[0091] like Figure 8 As shown, the data input module collects data through multiple access channels, such as structured data from medical institutions, time-series input data from wearable devices, and unstructured data from user input. These heterogeneous raw data are aggregated to form a real-time data stream, which then enters the next processing stage. In the data perception and feature extraction module, the LoRA (Large Language Model LLM) high-efficiency training technology is used to accurately extract entity relationships from raw data such as medical text, user input, and device data. The medical ontology library is used to complete the unification of units and standardization of terminology, converting heterogeneous data into standard triples (S, P, O), automatically correcting time ambiguities, and generating structured triples with timestamps, source tags, and categories as knowledge units.
[0092] In the decision-making module, a large language model is used to determine whether entities match, while simultaneously connecting to a medical standard terminology database to achieve authoritative mapping between diseases and symptoms. A three-level alignment strategy is employed to address heterogeneous representation issues, achieving triple entity alignment. Next, entity and relation conflict resolution is performed. When data from different sources describe the same fact inconsistently, the system arbitrates according to preset rules to ensure the uniqueness and accuracy of knowledge. Subsequently, the system reconstructs triples using four-dimensional information, performing a crucial graph reconstruction operation. Specifically, the standard (S, P, O) triples are expanded into four-dimensional triples including a time dimension. The core operation involves adding valid_from and valid_to information to create four-dimensional triples, thus achieving graph reconstruction.
[0093] In the data storage module, data that has undergone entity alignment and temporal modeling is stored in the Neo4j graph database of the storage layer via Flink's asynchronous write mechanism, establishing dynamic relationships between various health dimensions. Finally, in the execution feedback module, based on the real-time status of follow-up plans, medication information, and health needs in the graph, health risk inference is performed through a logic reasoning engine. If a risk exists, personalized intervention suggestions are triggered and pushed to the user or physician's terminal, completing a closed-loop process from raw data entry to clinical application feedback.
[0094] This invention improves the real-time performance and integration efficiency of health data. Through multi-source data access channels and semantic conversion, it enables the dynamic inflow of incremental data into the knowledge graph, solving the problem of lagging updates in traditional records. It ensures the accuracy and temporal traceability of knowledge graphs. Through a four-dimensional triplet structure and a multi-version coexistence mechanism, it enables the tracking and management of health data throughout its entire lifecycle, providing a reliable basis for disease diagnosis and intervention effect evaluation. It enhances the intelligence level of health management. Based on a dynamic reasoning engine and threshold triggering mechanism, it enables the automatic prediction of potential health risks and the accurate delivery of personalized intervention plans, thereby improving the application value of health records.
[0095] Figure 9 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention.
[0096] The following is a detailed reference. Figure 9This diagram illustrates a structural schematic suitable for implementing an electronic device according to embodiments of the present invention. The electronic device may include a processor (e.g., a central processing unit, a graphics processing unit, etc.) 11, which can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 12 or a program loaded from memory 18 into random access memory (RAM) 13. The RAM 13 also stores various programs and data required for the operation of the electronic device. The processor 11, ROM 12, and RAM 13 are interconnected via a bus 14. An input / output (I / O) interface 15 is also connected to the bus 14.
[0097] Typically, the following devices can be connected to I / O interface 15: input devices 16 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 17 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; memory devices 18 including, for example, magnetic tapes, hard disks, etc.; and communication devices 19. Communication device 19 allows electronic devices to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 9 Electronic devices with various devices are shown, but it should be understood that it is not required to implement or have all of the devices shown, and more or fewer devices may be implemented or have instead.
[0098] In particular, according to embodiments of the present invention, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 19, or installed from a memory 18, or installed from a ROM 12. When the computer program is executed by the processor 11, it performs the functions defined in the health knowledge graph update method of the embodiments of the present invention.
[0099] Figure 9 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments of the present invention.
[0100] This invention also provides a computer-readable storage medium. The methods described above according to embodiments of the invention can be implemented in hardware or firmware, or implemented as recordable on a storage medium, or implemented as computer code originally stored on a remote storage medium or a non-transitory machine-readable storage medium and subsequently stored on a local storage medium after being downloaded via a network. Thus, the methods described herein can be processed by software stored on a storage medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, optical disk, read-only memory, random access memory, flash memory, hard disk, or solid-state drive, etc.; further, the storage medium can also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code. When the software or computer code is accessed and executed by the computer, processor, or hardware, the health knowledge graph update method shown in the above embodiments is implemented.
[0101] A portion of this invention can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide the methods and / or technical solutions according to the invention through the operation of the computer. Those skilled in the art will understand that the forms in which computer program instructions exist in a computer-readable medium include, but are not limited to, source files, executable files, installation package files, etc. Correspondingly, the ways in which computer program instructions are executed by a computer include, but are not limited to: the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled program, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed program. Here, the computer-readable medium can be any available computer-readable storage medium or communication medium accessible to a computer.
[0102] Although embodiments of the invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations all fall within the scope defined by the appended claims.
Claims
1. A method for updating a health knowledge graph, characterized in that, The method includes: Acquire multi-source health data, extract entities to generate static triples containing relationships between entities, and construct an initial static health knowledge graph; When incremental health data is received, dynamic triples are generated. Based on the static triples and dynamic triples, time-series processing and multi-source conflict arbitration are performed, and the initial static health knowledge graph is updated using the arbitration triples. The updated health knowledge graph is used to infer health risks, and when the inference results meet the preset conditions, a health intervention plan is generated and triggered.
2. The method according to claim 1, characterized in that, Acquire multi-source health data, extract entities to generate static triples containing relationships between entities, and construct an initial static health knowledge graph, including: Acquire multi-source health data, and perform data preprocessing based on the structured data, semi-structured data, and unstructured data in the multi-source health data; A large language model is used to extract entities from the preprocessed health data and generate static triples containing relationships between entities. Entities in static triples are categorized, and entity alignment is performed based on the categorized entities using similarity calculation, terminology normalization, and attribute verification to construct an initial static health knowledge graph.
3. The method according to claim 1, characterized in that, When incremental health data is received, dynamic triples are generated. Based on the static and dynamic triples, temporal processing and multi-source conflict arbitration are performed. The arbitrated triples are then used to update the initial static health knowledge graph, including: When health increment data is received, conflicting data is marked, and dynamic triples are generated using a large language model; Based on the temporal attributes of the relationships between entities in the static and dynamic triples, multiple versions of triples that change over time are generated. Arbitrate the conflicts in the dynamic triples based on the conflict data marked in the dynamic triples; The initial static health knowledge graph is updated using the arbitrated triples.
4. The method according to claim 3, characterized in that, Arbitration of conflicts in the dynamic triples based on the conflict data marked in the dynamic triples includes: When the labeled conflict data is numerical, a dynamic fusion algorithm based on confidence assessment and multidimensional decay factor is used to fuse the conflict data. When the marked conflict data is a categorized conflict, the conflict data is arbitrated based on the source level of the conflict data and the preset verification method.
5. The method according to claim 3, characterized in that, The initial static health knowledge graph is updated using the arbitrated triples, including: The clipped triples are written to the message queue based on stream processing. The triples in the message queue are sharded according to user entity and relation type, and the triples are written in batches to the initial static health knowledge graph based on the sharding results. The MERGE function is used to write the triples during the writing process.
6. The method according to claim 1, characterized in that, The updated health knowledge graph is used to perform health risk reasoning, and when the reasoning results meet preset conditions, a health intervention plan is generated and triggered, including: A rule-based reasoning algorithm is used to perform health risk reasoning on the updated health knowledge graph to obtain the first reasoning result. A machine learning algorithm is used to perform health risk reasoning on the updated health knowledge graph to obtain a second reasoning result. Determine whether the first inference result and the second inference result meet the preset conditions, and generate a health intervention plan and trigger push notification if they do.
7. The method according to claim 1, characterized in that, The method further includes: Generate a knowledge graph evolution dashboard. The dashboard uses a timeline control to replay the historical changes in entity relationships and generates graphical annotations and statistical reports on conflict data and update status.
8. A health knowledge graph updating device, characterized in that, The device includes: The graph generation module is used to acquire multi-source health data, extract entities to generate static triples containing relationships between entities, and construct an initial static health knowledge graph. The graph update module is used to generate dynamic triples when receiving incremental health data, perform time-series processing and multi-source conflict arbitration based on the static triples and dynamic triples, and update the initial static health knowledge graph using the arbitration triples. The risk monitoring module is used to perform health risk reasoning using the updated health knowledge graph, and to generate and trigger a health intervention plan when the reasoning results meet preset conditions.
9. A health knowledge graph update system, characterized in that, The system includes: The graph generation layer is used to acquire multi-source health data, extract entities to generate static triples containing relationships between entities, and construct an initial static health knowledge graph. A data perception and feature extraction layer is used to generate dynamic triples when incremental health data is received. The decision layer is used to perform time-series processing and multi-source conflict arbitration based on the static triples and dynamic triples, and to update the initial static health knowledge graph using the arbitration triples. The execution feedback layer is used to perform health risk reasoning using the updated health knowledge graph, and to generate and trigger a health intervention plan when the reasoning results meet preset conditions.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing the computer to perform the health knowledge graph update method according to any one of claims 1 to 7.