A construction method of a medical informatization data system
By setting target units for the system, monitoring reagent batch number change time points, and constructing a knowledge graph, the problem of data offset caused by unit heterogeneity and reagent batch number changes in the medical information data system was solved. This enabled standardized processing and automated alignment of cross-institutional and cross-time period test data, improving the comparability and accuracy of the data.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHONGNAN HOSPITAL OF WUHAN UNIV
- Filing Date
- 2026-03-27
- Publication Date
- 2026-06-30
AI Technical Summary
Existing medical information data systems lack reliable basis for comparing test data across institutions and time periods. They cannot effectively solve the data deviation problems caused by unit heterogeneity and reagent batch number changes, resulting in data that cannot be efficiently converted into standardized data, affecting the accuracy and comparability of the data.
By setting the target unit of the system, using the unified code conversion rules for measurement units to perform unit conversion, monitoring the time point of reagent batch number change to divide the analysis window, calculating the quantile change and step amplification ratio, constructing a knowledge graph for structured storage and association, and realizing standardized data processing and automated alignment.
It achieves unified and standardized processing of test data from different institutions and time periods, provides reliable data comparison basis across institutions and time periods, improves data utilization efficiency and accuracy, and supports automatic alignment and root cause query of test data across time periods.
Smart Images

Figure CN122309494A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, and more specifically to a method for constructing a medical information data system. Background Technology
[0002] In the field of medical informatics, laboratory data serves as crucial data support for clinical decision-making, quality management, and scientific research analysis. Medical institutions generate a large number of raw verification records in their daily laboratory operations, containing key information about the data. With the increasing demand for collaborative medical services, the application of cross-institutional and cross-time-period laboratory data comparison is becoming more frequent. Effective comparison of laboratory data requires standardization of the data and the establishment of a unified target unit. This is a fundamental prerequisite for ensuring the usability of laboratory data in different scenarios. Current medical informatics data systems need to handle the processing and integration of large amounts of such data to meet the needs of multi-dimensional medical data applications.
[0003] Existing medical information systems have significant shortcomings in processing laboratory data, failing to meet the core requirement of data comparability across institutions and time periods. On one hand, laboratory data from different institutions or even the same institution at different times may use different units. The lack of a standardized processing mechanism with unified units of measurement prevents the efficient conversion of raw data into standardized data with a unified reference, highlighting data heterogeneity. On the other hand, changes in reagent batch numbers can also affect the distribution of laboratory data. Furthermore, existing systems lack effective structured storage and association mechanisms, failing to achieve automated data alignment or trace the root causes of data discrepancies. Ultimately, this results in a lack of reliable evidence for comparing laboratory data across institutions and time periods, impacting the accuracy and effectiveness of medical-related work. Summary of the Invention
[0004] To address the shortcomings of existing technologies, this invention proposes a method for constructing a medical information data system, which solves the problem of lacking reliable evidence for comparing test data across institutions and time periods.
[0005] To achieve the above objectives, the present invention provides the following technical solution: Obtain the original verification record set, which includes the original value, original unit, timestamp, reagent batch number and institution identifier. Set the system target unit, determine the unit conversion factor according to the unified code conversion rule of measurement units, and multiply the original value by the unit conversion factor to obtain the standardized value. Record the time points when reagent batch numbers change to form a batch number change time point set, and define an analysis window based on the batch number change time point set; Sort all standardized values within the analysis window and determine the 25th and 75th percentiles of the analysis window based on the sorting results. The 25th and 75th percentiles are the standardized values that reflect the lower and higher percentile levels of the data within the analysis window, respectively. The differences between the two 25th percentiles and the two 75th percentiles of each pair of adjacent analysis windows in time sequence are labeled as low quantile change and high quantile change, respectively. The ratio of the low quantile change to the high quantile change is denoted as the low quantile step amplification ratio, which reflects the degree of amplification of the change amplitude in the low concentration region relative to that in the high concentration region. The low quantile step amplification ratios are sorted, and a pivot window is determined as the global reference window based on the sorting results.
[0006] Furthermore, based on the Unified Unit of Measurement Code (UCUM) standard, the system target unit u0 is set, and the system target unit u0 is a unified unit of measurement for the entire system; Based on the Unified Unit Code (UCUM) conversion rules, the unit conversion factor c for converting the original unit u in the original verification record set to the system target unit u0 is determined. The original value y is then converted to a standardized value y1 with the system target unit u0 using the unit conversion factor c, as follows: Multiplying the original value y by the unit conversion factor c yields the standardized value y1.
[0007] Furthermore, the time points when reagent batch number L in the original verification record set changed are monitored and recorded. The recorded time points are then combined to obtain the batch number change time point set {T}. k}, based on the set of batch number change time points {T k The time points in} serve as dividing points, dividing the entire timeline into continuous and non-overlapping time periods, and defining each time period as an analysis window W. k Among them, the analysis window W k The time range is [T k T k+1 ).
[0008] Furthermore, obtain the W for each analysis window. k All standardized values y1 within the range, and for each analysis window W k All standardized values y1 within the range are sorted in ascending order, and the analysis window W is determined based on the sorting result. k 25th percentile Q 25 (W k ) and the 75th percentile Q 75 (W k The details are as follows: 25th percentile Q25 (W k The value represents the standardized value at the 25th percentile after sorting, and the analysis window W... k 25% of the standardized values are smaller than this standardized value; 75th percentile Q 75 (W k The value represents the standardized value at the 75th percentile after sorting, and the analysis window W... k 75% of the standardized values are smaller than this standardized value; W for each analysis window k Its corresponding 25th percentile Q 25 (W k ) and the 75th percentile Q 75 (W k After being stored in association, window statistics tables are formed, and each window statistics table contains each analysis window W. k A unique window identifier used to uniquely identify the analysis window, and W for each analysis window. k The corresponding 25th percentile Q 25 (W k ) and the 75th percentile Q 75 (W k ).
[0009] Furthermore, by statistically analyzing the timestamps t corresponding to the original numerical values y in each analysis window W... k The distribution is thus used to obtain the W of each analysis window. k The standardized value y1 corresponding to all the original values y in the range.
[0010] Furthermore, for each pair of analysis windows {W} that are adjacent in time sequence in the window statistics table... k W k+1} Calculate the 25th percentile Q for each pair of analysis windows. 25 (W k ) and the 75th percentile Q 75 (W k The changes are as follows: Each pair of analysis windows {W k W k+1 Analysis window W in} k+1 25th percentile Q 25 (W k+1 Subtract the analysis window W k 25th percentile Q 25 (W k After that, the difference obtained is the change in the lower quantile. ; Each pair of analysis windows {W k Wk+1 Analysis window W in} k+1 75th percentile Q 75 (W k+1 Subtract the analysis window W k 75th percentile Q 75 (W k After that, the difference obtained is the change in the higher quantiles. ; All calculated low quantile changes and high quantile change With each pair of analysis windows {W k W k+1 After correlation, the set of quantile changes of adjacent windows is obtained. The set of quantile changes of adjacent windows includes each pair of analysis windows {W}. k W k+1} and its corresponding lower quantile changes and high quantile change ; The number of elements in the set of adjacent window quantile changes is equal to the number of elements in the analysis window W in the window statistics table. k Subtract 1 from the total.
[0011] Furthermore, for each pair of analysis windows {W} in the set of adjacent window quantile changes... k W k+1}, calculate each pair of analysis windows {W k W k+1 The corresponding low-quantile step amplification ratio is as follows: Each pair of analysis windows {W k W k+1 The corresponding low quantile change The absolute value divided by the change in the higher quantile The ratio obtained after taking the absolute value is labeled as the low quantile step amplification ratio R. i ; Each pair of analysis windows {W k W k+1 The corresponding low quantile step amplification ratio R i Sort the values in ascending order, select the median of the sorted sequence, and then analyze the corresponding pair of analysis windows {W}. k W k+1 The left-hand analysis window W in} k The pivot window W* serves as a global reference.
[0012] Furthermore, the 75th percentile Q of the pivot window W* 25 (W*) minus the 25th percentile Q of the pivot window W* 25The difference obtained after (W*) is marked as the pivot window quantile difference, which reflects the fluctuation range of the data within the pivot window W* from the low quantile to the high quantile. W for each analysis window k 75th percentile Q 25 (W k Subtract W from each analysis window k 25th percentile Q 25 (W k The difference obtained after that is marked as reflecting each analysis window W. k The quantile difference is used to analyze the fluctuation range of internal data from low to high quantiles. The value obtained by dividing the pivot window quantile by the analysis window quantile is labeled as the scaling correction factor.
[0013] Furthermore, the 25th percentile Q of the pivot window W* 25 (W*) minus the analysis window W k 25th percentile Q 25 (W k The difference obtained by multiplying the product of the scale correction factor and the scale correction factor is labeled as the translation correction factor.
[0014] Furthermore, based on the original set of verification records, the window statistics table, and each analysis window W... k The corresponding scaling correction factor and W for each analysis window k The corresponding translation correction factor is represented in the knowledge graph for each analysis window W. k Create a window node; node attributes include a unique window identifier and a time range [T]. k T k+1 ), reagent batch number L, institution identifier m, first mapping parameter and second mapping parameter, where the time range [T k T k+1 ) is the analysis window W k The corresponding time interval, the first mapping parameter is the analysis window W. k The corresponding scaling correction factor, and the second mapping parameter is the analysis window W. k The corresponding translation correction factor, W for each analysis window k Each reagent batch number corresponds to only one reagent batch number and one institution identifier. Output the set of window nodes in the knowledge graph, where each window node corresponds to an analysis window W. k With the analysis window W k Node attributes; W for each analysis window kAll the original values y and the standardized values y1 corresponding to the original values y are combined to form the observation data set. Each element in the observation data set is a unique window identifier, an original value y, and a standardized value y1 corresponding to the original value y. For each element in the observation dataset, an observation node is created in the knowledge graph. The node attributes include the original value y and the standardized value y1 corresponding to the original value y. The association between the observation node and the window node is established based on the unique window identifier of each element in the observation dataset. The relationship name is "belongs to this window". Output the set of observation nodes and the set of relationships between observation nodes and window nodes in the knowledge graph; Find the window node corresponding to the pivot window W* in the knowledge graph, add a boolean attribute named "whether it is a pivot window" to the window node corresponding to the pivot window W*, and set the value of the boolean attribute to 1. The value of the boolean attribute of other window nodes is 0 by default.
[0015] Compared with existing technologies, it has the following advantages: This solution proposes a method for constructing a medical information data system. By establishing a standardized processing mechanism based on unified unit conversion rules, it effectively solves the problem of cross-institutional and cross-time period comparability of test data caused by heterogeneous units in existing systems. The solution first obtains a set of original verification records containing the original value, original unit, timestamp, reagent batch number, and institution identifier. Then, it sets a unified system target unit for the entire system, determines the unit conversion factor based on the unified unit conversion rules, and multiplies the original value by the unit conversion factor to obtain the standardized value. This process eliminates the semantic differences in units between different institutions or between different periods within the same institution using a universal and unified conversion logic. It enables test data that were previously unable to be directly correlated due to inconsistent units to have a consistent calculation and comparison benchmark. This provides reliable basic data support for scenarios such as cross-institutional test result mutual recognition and regional medical quality control data integration, significantly improving the utilization efficiency and accuracy of test data in multiple scenarios.
[0016] This solution addresses the problem of existing systems' inability to effectively identify and quantify systematic shifts caused by reagent batch number changes by constructing a quantification and global reference system around these changes. The solution first monitors and records the time points when reagent batch numbers change in the original verification record set, forming a set of batch number change time points. Using this set as a boundary, the time axis is divided into continuous and non-overlapping analysis windows. Then, the standardized values within each analysis window are sorted to determine the 25th and 75th percentiles, reflecting the data distribution characteristics. The lower quantile change, higher quantile change, and lower quantile step amplification ratio of adjacent analysis windows are calculated. The analysis window to the left of the median after sorting by lower quantile step amplification ratio is selected as the pivot window for global reference. This process accurately captures the data distribution shift characteristics caused by reagent batch number changes, transforming previously difficult-to-quantify systematic shifts into calculable numerical relationships. Simultaneously, using the pivot window as a unified reference benchmark provides a comparable basis for test data under different reagent batch numbers, avoiding interpretation biases caused by shifts.
[0017] This solution constructs a knowledge graph to achieve structured storage and association of testing data's background and calibration parameters, solving the problems of traceability difficulties and insufficient automated alignment caused by the separation of data background and calibration logic in existing systems. Based on the scale correction factor and translation correction factor corresponding to each analysis window in the original calibration record set window statistics table, the solution creates a window node in the knowledge graph for each analysis window. The window node contains attributes such as window unique identifier, time range, reagent batch number, institution identifier, scale correction factor, and translation correction factor. Simultaneously, an observation node is created for each observation data point, associating it with the original and standardized values. A specific relationship is established between the observation node and the window node, and a Boolean identifier is added to the window node corresponding to the pivot window. This structured design achieves deep binding between testing data's background information and calibration parameters. It supports rapid tracing to the corresponding analysis window's calibration logic through observation nodes, enabling automated alignment of testing data across time periods. It also provides a clear and traceable semantic link for data difference root cause queries and quality control analysis, further unlocking the value of testing data in clinical collaborative research analysis and other scenarios. Attached Figure Description
[0018] Figure 1 This is a schematic diagram of the method flow of the present invention. Detailed Implementation
[0019] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0020] Please see Figure 1 This application provides a method for constructing a medical information data system; The method specifically includes the following steps: Step 1: When building a medical information data system, the raw test data is heterogeneous due to differences in units, testing periods, batch numbers, etc. It needs to be standardized and divided into analysis windows, as follows: Obtain the original verification record set, which is the electronic data generated by the routine testing business of medical institutions. The original verification record set includes the original value y, the original unit u, the timestamp t, the reagent batch number L, and the institution identifier m. Specifically, the original value y is the actual measured result of the test item. For example, the original value of a certain test item for a certain patient is 0.08. The original unit u is the unit used when measuring the original value, such as millimoles per liter. The timestamp t is the specific time when the test was performed, such as 25_10_01_08:30:00. The reagent batch number L is the batch identifier of the reagent used in the test, such as LOT2025001. The institution identifier m is the unique identifier of the institution that performed the test, such as the code HS_A corresponding to hospital A. Set the system target unit u0. The system target unit u0 is a unified measurement unit for the entire system. It is defined based on the Unified Code for Measurement Units (UCUM) standard to ensure the semantic uniqueness of the unit. According to the conversion rules of the Unified Code for Measurement Units (UCUM), the unit conversion factor c is determined to convert the original unit u in the original verification record set to the system target unit u0. The original value y is converted into a standardized value y1 with the system target unit u0 by the unit conversion factor c. Specifically, the original value y is multiplied by the unit conversion factor c to obtain the standardized value y1. For example, if the original unit u = mmol / L and the system target unit u0 = umol / L, then the unit conversion factor c is determined to be 1000. Therefore, the standardized value y1 = the original value y × 1000. Specifically, the unit conversion factor c determined according to the UCUM conversion rules can eliminate unit heterogeneity, enabling direct calculation comparability of test values from different institutions and units, ensuring that cross-institutional data is unambiguous at the unit level, and is the basis for achieving cross-domain comparability of test data. Monitor and record the time points when reagent batch number L changes in the original verification record set, and combine the recorded time points to obtain the batch number change time point set {T}. k}, based on the set of batch number change time points {T k The time points in} serve as dividing points, dividing the entire timeline into continuous and non-overlapping time periods, and defining each time period as an analysis window W. k Among them, the analysis window Wk The time range is [T k T k+1 That is, from the k-th change time point T k Starting from the (k+1)th change point T k+1 End, for example, if T0 is the initial time, corresponding to reagent batch number L being LOT2025001, T1 is the first batch number change time, corresponding to reagent batch number L changing from LOT2025001 to LOT2025002, and T2 is the second batch number change time, corresponding to reagent batch number L changing from LOT2025002 to LOT2025003, then the window W0=[T0,T1) and W1=[T1,T2); Specifically, since the change of reagent batch number L is a known systematic offset triggering event in medical testing, dividing the window in this way can accurately pinpoint the source of the offset, while stabilizing the detection background within each analysis window, providing an interference-free analysis unit for subsequent calculations, which is a key prerequisite for system construction. Get each analysis window W k All standardized values y1 within the range, and for each analysis window W k All standardized values y1 within the range are sorted in ascending order, and the analysis window W is determined based on the sorting result. k 25th percentile Q 25 (W k ) and the 75th percentile Q 75 (W k ), where the 25th percentile Q 25 (W k The value represents the standardized value at the 25th percentile after sorting, i.e., the analysis window W. k The 25% of standardized values within the range are smaller than this standardized value, reflecting the low quantile level of the data within the analysis window, specifically the 75th percentile Q. 75 (W k The value represents the standardized value at the 75th percentile after sorting, i.e., the analysis window W. k If 75% of the standardized values are less than this value, it reflects the high quantile level of the data within the analysis window. For example, if analysis window W0 contains 100 standardized values, and the 25th standardized value after sorting is 70, then the 25th percentile Q is... 25 (W k If the 75th standardized value is 130, then the 75th percentile Q is... 75 (W k )=130; Specifically, the timestamp t corresponding to the original value y is calculated in each analysis window W. k The distribution is thus used to obtain the W of each analysis window. kThe standardized value y1 corresponding to all original values y within the range; W for each analysis window k Its corresponding 25th percentile Q 25 (W k ) and the 75th percentile Q 75 (W k After being stored in association, window statistics tables are formed, and each window statistics table contains each analysis window W. k A unique window identifier used to uniquely identify the analysis window, and W for each analysis window. k The corresponding 25th percentile Q 25 (W k ) and the 75th percentile Q 75 (W k The purpose of creating a window statistics table is to structurally store the percentile characteristics of each analysis window, making them a direct input for subsequent steps. Specifically, since medical test data are mostly right-skewed, with higher data density in low-concentration areas and more significantly affected by additive bias, and since quantiles are not sensitive to extreme values and can stably reflect the key characteristics of data distribution, it is necessary to calculate the 25th percentile Q. 25 (W k ) and the 75th percentile Q 75 (W k ), where Q 25 (W k ) is the analysis window W k Typical levels in the low-concentration region, Q 75 (W k The typical level of the high concentration region is represented by the combination of these two factors, which can accurately capture the asymmetry of the spatial variation amplitude between low and high concentrations in low quantile step amplification. This is the core basis for quantifying low quantile step amplification. Low quantile step amplification is a quantifiable distributional distortion phenomenon in medical test data triggered by discrete events such as reagent batch number changes. Before and after the time point of the change event, the overall data distribution is divided into two adjacent analysis windows, where the 25th percentile Q of the window after the switch is... 25 (W k The 25th percentile Q of the window before the switch 25 (W k The magnitude of the change is systematically greater than the 75th percentile Q of the window after the switch. 75 (W k The 75th percentile Q of the window before the switch 75 (W kThe magnitude of the change is reflected in the overall jump in the data distribution before and after the time point, rather than a gradual change or periodic fluctuation. The amplification is due to the fact that the low concentration area is more significantly affected by additive bias and most test data are right-skewed. The low-end data has a higher density, making it more sensitive to discrete events. This phenomenon is the core indicator for identifying the system offset caused by batch number switching. Its quantitative results provide a key basis for establishing reference standards and achieving data alignment.
[0021] Step 2: For each pair of analysis windows {W} that are adjacent in time sequence in the window statistics table. k W k+1} Calculate the 25th percentile Q for each pair of analysis windows. 25 (W k ) and the 75th percentile Q 75 (W k The changes are as follows: Each pair of analysis windows {W k W k+1 Analysis window W in} k+1 25th percentile Q 25 (W k+1 Subtract the analysis window W k 25th percentile Q 25 (W k After that, the difference obtained is the change in the lower quantile. ; Each pair of analysis windows {W k W k+1 Analysis window W in} k+1 75th percentile Q 75 (W k+1 Subtract the analysis window W k 75th percentile Q 75 (W k After that, the difference obtained is the change in the higher quantiles. ; All calculated low quantile changes and high quantile change With each pair of analysis windows {W k W k+1 After correlation, the set of quantile changes of adjacent windows is obtained. The set of quantile changes of adjacent windows includes each pair of analysis windows {W}. k W k+1} and each pair of analysis windows {W k W k+1 The corresponding low quantile change and high quantile change The number of elements in the set of quantile changes of adjacent windows is equal to the number of elements in the analysis window W in the window statistics table.k Subtract 1 from the total; Specifically, if the 25th percentile Q of the adjacent window W1 25 (W1) is the 70th and 75th percentile Q. 75 (W1) is 130, and the 25th percentile of window W2 is Q. 25 (W2) is the 88th and 75th percentile Q. 75 If (W2) is 140, then the change in the lower quantile is... High quantile change By calculating the low quantile changes of adjacent windows and high quantile change This transforms the impact of batch number switching on data distribution into quantifiable values, providing direct input for subsequent steps to identify the greater sensitivity of lower quantiles to higher quantiles. It represents a transition from qualitative description to quantitative analysis of lower quantile step amplification. For each pair of analysis windows {W} in the set of adjacent window quantile changes k W k+1}, calculate each pair of analysis windows {W k W k+1 The corresponding low-quantile step amplification ratio, used to quantify the amplification of the low-quantile change relative to the high-quantile change, is calculated as follows: Each pair of analysis windows {W k W k+1 The corresponding low quantile change The absolute value divided by the change in the higher quantile The ratio obtained after taking the absolute value is labeled as the low quantile step amplification ratio R. i Specifically, the qualitative characteristic that lower quantiles are more sensitive to than higher quantiles is transformed into a quantitative ratio, making the step amplification of lower quantiles calculable and comparable. This provides an objective basis for the subsequent selection of the pivot window. When the step amplification ratio R of lower quantiles is... i When the value is greater than 1, it indicates that the change in the lower quantile is more significant, that is, the phenomenon of step amplification in the lower quantile occurs. Each pair of analysis windows {W k W k+1 The corresponding low quantile step amplification ratio R i Sort the values in ascending order, and select the lowest quantile step amplification ratio R that is in the middle of the sorted values. i The median of the sorted results, and the corresponding pair of analysis windows {W} k W k+1 The left-hand analysis window W in} k The pivot window W* serves as a global reference. Specifically, the pivot window W* is a global reference window selected from the window statistics table. Its detection background is stable and it is used as the alignment benchmark for all subsequent window data. Specifically, through the low quantile step amplification ratio R i The sorted median is selected via a pivot window W*, avoiding extremely low step amplification ratios R at extreme quantiles. i The numerical values do not interfere with the reference standard, ensuring that the pivot window is representative and stable. At the same time, the reference is selected based on the distribution law of the low quantile step amplification characteristics, so that subsequent data alignment can accurately offset the system offset caused by batch number switching, providing a unified benchmark for cross-window and cross-institution data comparison.
[0022] Step 3: Set the 75th percentile Q of the pivot window W* 25 (W*) minus the 25th percentile Q of the pivot window W* 25 The difference obtained after (W*) is marked as the pivot window quantile difference, which reflects the fluctuation range of the data within the pivot window W* from the low quantile to the high quantile. W for each analysis window k 75th percentile Q 25 (W k Subtract W from each analysis window k 25th percentile Q 25 (W k The difference obtained after that is marked as reflecting each analysis window W. k The quantile difference is used to analyze the fluctuation range of internal data from low to high quantiles. The value obtained by dividing the pivot window quantile by the analysis window quantile is labeled as the scaling correction factor, which is used to adjust the analysis window W. k The fluctuation range of the data within the analysis window is adjusted to match the fluctuation range of the pivot window W*. Specifically, the fluctuation range from the low quantile to the high quantile of the data in each analysis window is uniformly adjusted to match the pivot window through the scaling correction factor. This lays the foundation for subsequent overall horizontal alignment, ensures the shape preservation of the data distribution, and avoids alignment deviations caused by differences in fluctuation range. For each analysis window W k The translation correction factor is calculated as follows: Q, the 25th percentile of the pivot window W* 25 (W*) minus the analysis window W k 25th percentile Q 25 (W k The difference between the product of the product of the product and the scaling correction factor is the translation correction factor. The translation correction factor is the value within the analysis window W. k The overall horizontal adjustment factor to the pivot window W* is used to align the low percentile of the analysis window with the low percentile of the pivot window. Specifically, the scaling correction factor is a fluctuation adjustment factor between the analysis window and the pivot window. It reflects the difference between the fluctuation range of the data in the analysis window from the lower to the higher quantiles and that of the pivot window, and is used to unify the fluctuation range of the two. The translation correction factor is an overall level adjustment factor between the analysis window and the pivot window. It reflects the offset of the overall data level of the analysis window from the pivot window, and is used to align the lower quantile levels of the two. Assuming the 25th percentile of the pivot window is 70 and the 75th percentile is 130, the quantile difference of the pivot window is 60. The quantile difference of the pivot window reflects the fluctuation range of the data. Assuming the 25th percentile of a certain analysis window is 80 and the 75th percentile is 140, the quantile difference of the analysis window is 60. When calculating the scaling correction factor, the quantile difference of the pivot window (60) is divided by the quantile difference of the analysis window (60), and the result is... 1. This means that the data fluctuation range of the analysis window is exactly the same as that of the pivot window, and no adjustment of the fluctuation range is needed. When calculating the translation correction factor, the result of subtracting the scale correction factor = 1 multiplied by the 25th percentile of the pivot window (70) from the result of multiplying the 25th percentile of the analysis window (80) is -10. This indicates that the overall data level of the analysis window is 10 higher than that of the pivot window. It is necessary to perform a translation operation by subtracting 10 from the whole to align the low concentration area level of analysis window one with the pivot window. At this time, -10 is the translation correction factor. After such correction, any standardized value in the analysis window will change its low concentration area level from 80 to 70 and its high concentration area level from 140 to 130 after being calculated by multiplying the scale correction factor by the value and then adding the translation correction factor. This will align it with the corresponding level of the pivot window.
[0023] Step 4: Based on the original set of verification records, the window statistics table, and each analysis window W k The corresponding scaling correction factor and W for each analysis window k The corresponding translation correction factor is represented in the knowledge graph for each analysis window W. k Create a window node; node attributes include a unique window identifier and a time range [T]. k T k+1 ), reagent batch number L, institution identifier m, first mapping parameter and second mapping parameter, where the time range [T k T k+1 ) is the analysis window W k The corresponding time interval, the first mapping parameter is the analysis window W. k The corresponding scaling correction factor, and the second mapping parameter is the analysis window W. k The corresponding translation correction factor, W for each analysis window k Each reagent corresponds to only one batch number and one institution identifier. Specifically, for example, the unique identifier for analysis window W1 is W1, and the time range is [T]. k T k+1If the reagent batch number is LOT2025001, the institution identifier is HS_A, the first mapping parameter is 1, and the second mapping parameter is -10, then a W1 node is created in the knowledge graph, and its attributes contain all the above information. Output the set of window nodes in the knowledge graph, where each window node corresponds to an analysis window W. k With the analysis window W k Node attributes; Specifically, a knowledge graph is a structured data model that uses nodes to represent entities and relationships to represent the connections between entities. In this case, it is used to store entities and relationships related to the test data. By structurally associating the detection background and correction parameters of the window through window nodes, subsequent queries can directly obtain complete information on why the data needs to be corrected and how to correct it through the window nodes, ensuring the traceability and transparency of the correction logic. W for each analysis window k All the original values y and their corresponding standardized values y1 are combined to form the observation data set. Each element in the observation data set is a unique window identifier, an original value y, and its corresponding standardized value y1. An observation node is created in the knowledge graph for each element in the observation data set. The node attributes include the original value y and its corresponding standardized value y1. The association between the observation node and the window node is established based on the unique window identifier of each element in the observation data set. The relationship name is "belongs to this window", which is used to clarify the attribution relationship between the observation data and the window background that generated it. Output the set of observation nodes and the set of relationships between observation nodes and window nodes in the knowledge graph; Specifically, by establishing a complete link from the observed data to the window node and then to the correction parameter through the window relationship, the window to which the observed node belongs can be quickly located during subsequent queries, thereby obtaining the corresponding correction parameter and providing a semantic path for automatic data alignment. Locate the window node corresponding to the pivot window W* in the knowledge graph, add a Boolean attribute to the window node corresponding to the pivot window W*: whether it is a pivot window, and set the value of this Boolean attribute to 1 to clearly mark this window node as the unified reference standard for data alignment across the entire system. The default value of this Boolean attribute for other window nodes is 0. Specifically, the global reference role of the pivot window W* is clearly defined by the attribute mark, ensuring that all subsequent data alignment operations are based on this window node, avoiding alignment deviations caused by confusion of reference standards, and ensuring the consistency of cross-window and cross-organization data comparison. Specifically, in the process of medical testing data processing, due to differences in reagent batch numbers, units, etc., the original data exhibits significant heterogeneity. Test results from different institutions and time periods are often difficult to compare directly. The knowledge graph created based on the acquired data integrates the testing background and correction parameters through window nodes, associates standardized observation data with corresponding windows through the relationship of belonging to that window, and clearly marks the pivot window as a global reference. Its core function is to transform scattered data, background, correction logic, and unified reference elements into structured semantic links, and to concretize the abstract reasons for data differences and specific correction rules into queryable and reasonable nodes and relationships. This enables the system not only to store data, but also to understand the source of data differences and the path to eliminate them. For medical information systems, this knowledge graph provides a standardized foundation for cross-hospital data exchange. During queries, observation nodes can be traced back to window nodes, and calibration parameters can be invoked to automatically align data to the pivot window reference system, enabling direct comparability of test results. At the same time, the background information of window nodes can be used to trace the root causes of data differences, such as systematic deviations of specific reagent batches, providing a basis for quality control and method optimization. The structured semantic storage also provides high-quality, interpretable, and standardized data for higher-level applications such as scientific research analysis and assisted diagnosis, promoting the upgrade of medical data from fragmented storage to integrated utilization, and ultimately supporting the core objectives of medical information systems in data exchange, quality improvement, and value mining.
[0024] The above embodiments are only used to illustrate the technical methods of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical methods of the present invention without departing from the spirit and scope of the technical methods of the present invention.
Claims
1. A method for constructing a medical information data system, characterized in that, include: Obtain the original verification record set, which includes the original value, original unit, timestamp, reagent batch number and institution identifier. Set the system target unit, determine the unit conversion factor according to the unified code conversion rule of measurement units, and multiply the original value by the unit conversion factor to obtain the standardized value. Record the time points when reagent batch numbers change to form a batch number change time point set, and define an analysis window based on the batch number change time point set; Sort all standardized values within the analysis window and determine the 25th and 75th percentiles of the analysis window based on the sorting results. The 25th and 75th percentiles are the standardized values that reflect the lower and higher percentile levels of the data within the analysis window, respectively. The differences between the two 25th percentiles and the two 75th percentiles of each pair of adjacent analysis windows in time sequence are labeled as low quantile change and high quantile change, respectively. The ratio of the low quantile change to the high quantile change is denoted as the low quantile step amplification ratio, which reflects the degree of amplification of the change amplitude in the low concentration region relative to that in the high concentration region. The low quantile step amplification ratios are sorted, and a pivot window is determined as the global reference window based on the sorting results.
2. The method for constructing a medical information data system according to claim 1, characterized in that, include: The system target unit u0 is set based on the Unified Code for Measurement Units (UCUM) standard. The system target unit u0 is a unified measurement unit for the entire system. Based on the Unified Unit Code (UCUM) conversion rules, the unit conversion factor c for converting the original unit u in the original verification record set to the system target unit u0 is determined. The original value y is then converted to a standardized value y1 with the system target unit u0 using the unit conversion factor c, as follows: Multiplying the original value y by the unit conversion factor c yields the standardized value y1.
3. The method for constructing a medical information data system according to claim 2, characterized in that, include: Monitor and record the time point of reagent batch number L change in the original check record set, combine the recorded time points to obtain a batch number change time point set {T k}, and divide the entire time axis into continuous and non-overlapping time periods with the time points in the batch number change time point set {T k} as the dividing points. Each time period is defined as an analysis window W k , wherein the time range of the analysis window W k is [T k , T k+1 ).
4. The method for constructing a medical information data system according to claim 3, characterized in that, include: Get each analysis window W k All standardized values y1 within the range, and for each analysis window W k All standardized values y1 within the range are sorted in ascending order, and the analysis window W is determined based on the sorting result. k 25th percentile Q 25 (W k ) and the 75th percentile Q 75 (W k The details are as follows: 25th percentile Q 25 (W k The value represents the standardized value at the 25th percentile after sorting, and the analysis window W... k 25% of the standardized values are smaller than this standardized value; 75th percentile Q 75 (W k The value represents the standardized value at the 75th percentile after sorting, and the analysis window W... k 75% of the standardized values are smaller than this standardized value; W for each analysis window k Its corresponding 25th percentile Q 25 (W k ) and the 75th percentile Q 75 (W k After being stored in association, window statistics tables are formed, and each window statistics table contains each analysis window W. k A unique window identifier used to uniquely identify the analysis window, and W for each analysis window. k The corresponding 25th percentile Q 25 (W k ) and the 75th percentile Q 75 (W k ).
5. The method for constructing a medical information data system according to claim 4, characterized in that, By statistically analyzing the timestamp t corresponding to the original value y in each analysis window W k The distribution is thus used to obtain the W of each analysis window. k The standardized value y1 corresponding to all the original values y in the range.
6. The method for constructing a medical information data system according to claim 4, characterized in that, include: For each pair of analysis windows {W} that are adjacent in time sequence in the window statistics table k W k+1 } Calculate the 25th percentile Q for each pair of analysis windows. 25 (W k ) and the 75th percentile Q 75 (W k The changes are as follows: Each pair of analysis windows {W k W k+1 Analysis window W in} k+1 25th percentile Q 25 (W k+1 Subtract the analysis window W k 25th percentile Q 25 (W k After that, the difference obtained is the change in the lower quantile. ; Each pair of analysis windows {W k W k+1 Analysis window W in} k+1 75th percentile Q 75 (W k+1 Subtract the analysis window W k 75th percentile Q 75 (W k After that, the difference obtained is the change in the higher quantiles. ; All calculated low quantile changes and high quantile change With each pair of analysis windows {W k W k+1 After correlation, the set of quantile changes of adjacent windows is obtained. The set of quantile changes of adjacent windows includes each pair of analysis windows {W}. k W k+1 } and its corresponding lower quantile changes and high quantile change ; The number of elements in the set of adjacent window quantile changes is equal to the number of elements in the analysis window W in the window statistics table. k Subtract 1 from the total.
7. The method for constructing a medical information data system according to claim 6, characterized in that, include: For each pair of analysis windows {W} in the set of adjacent window quantile changes k W k+1 }, calculate each pair of analysis windows {W k W k+1 The corresponding low-quantile step amplification ratio is as follows: Each pair of analysis windows {W k W k+1 The corresponding low quantile change The absolute value divided by the change in the higher quantile The ratio obtained after taking the absolute value is labeled as the low quantile step amplification ratio R. i ; Each pair of analysis windows {W k W k+1 The corresponding low quantile step amplification ratio R i Sort the values in ascending order, select the median of the sorted sequence, and then analyze the corresponding pair of analysis windows {W}. k W k+1 The left-hand analysis window W in} k The pivot window W* serves as a global reference.
8. The method for constructing a medical information data system according to claim 7, characterized in that, include: Q, the 75th percentile of the pivot window W* 25 (W*) minus the 25th percentile Q of the pivot window W* 25 The difference obtained after (W*) is marked as the pivot window quantile difference, which reflects the fluctuation range of the data within the pivot window W* from the low quantile to the high quantile. W for each analysis window k 75th percentile Q 25 (W k Subtract W from each analysis window k 25th percentile Q 25 (W k The difference obtained after that is marked as reflecting each analysis window W. k The quantile difference is used to analyze the fluctuation range of internal data from low to high quantiles. The value obtained by dividing the pivot window quantile by the analysis window quantile is labeled as the scaling correction factor.
9. The method for constructing a medical information data system according to claim 8, characterized in that, include: Q, the 25th percentile of the pivot window W* 25 (W*) minus the analysis window W k 25th percentile Q 25 (W k The difference obtained by multiplying the product of the scale correction factor and the scale correction factor is labeled as the translation correction factor.
10. A method for constructing a medical information data system according to claim 9, characterized in that, include: Based on the original set of verification records, the window statistics table, and each analysis window W k The corresponding scaling correction factor and W for each analysis window k The corresponding translation correction factor is represented in the knowledge graph for each analysis window W. k Create a window node; node attributes include a unique window identifier and a time range [T]. k T k+1 ), reagent batch number L, institution identifier m, first mapping parameter and second mapping parameter, where the time range [T k T k+1 ) is the analysis window W k The corresponding time interval, the first mapping parameter is the analysis window W. k The corresponding scaling correction factor, and the second mapping parameter is the analysis window W. k The corresponding translation correction factor, W for each analysis window k Each reagent batch number corresponds to only one reagent batch number and one institution identifier. Output the set of window nodes in the knowledge graph, where each window node corresponds to an analysis window W. k With the analysis window W k Node attributes; W for each analysis window k All the original values y and the standardized values y1 corresponding to the original values y are combined to form the observation data set. Each element in the observation data set is a unique window identifier, an original value y, and a standardized value y1 corresponding to the original value y. For each element in the observation dataset, an observation node is created in the knowledge graph. The node attributes include the original value y and the standardized value y1 corresponding to the original value y. The association between the observation node and the window node is established based on the unique window identifier of each element in the observation dataset. The relationship name is "belongs to this window". Output the set of observation nodes and the set of relationships between observation nodes and window nodes in the knowledge graph; Find the window node corresponding to the pivot window W* in the knowledge graph, add a boolean attribute named "whether it is a pivot window" to the window node corresponding to the pivot window W*, and set the value of the boolean attribute to 1. The value of the boolean attribute of other window nodes is 0 by default.