Structured vital sign data processing method for clinical research

By dynamically generating personalized quality control strategies and a directed acyclic quality control workflow, the problems of data misjudgment and signal loss in existing technologies are solved, achieving adaptive data purification and preservation, and improving the quality and credibility of scientific research data.

CN122245573APending Publication Date: 2026-06-19HANG ZHOU SHEN MA ZHI NENG KE JI YOU XIAN GONG SI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HANG ZHOU SHEN MA ZHI NENG KE JI YOU XIAN GONG SI
Filing Date
2026-03-04
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing clinical research data preprocessing systems, due to their use of fixed rules and uniform processes, are unable to adapt to complex and ever-changing clinical scenarios, leading to problems such as data misinterpretation, loss of true signals, and bias in research conclusions.

Method used

By parsing the metadata of clinical research tasks, personalized quality control strategies are dynamically generated, a directed acyclic quality control workflow is constructed, processing decisions are captured in real time, and a structured data view that meets research needs is built in memory, thereby achieving adaptive data purification and preservation.

Benefits of technology

It has improved the basic quality of scientific research data, reduced false alarms and omissions, ensured data integrity, reduced research bias, and improved the credibility and robustness of research findings.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245573A_ABST
    Figure CN122245573A_ABST
Patent Text Reader

Abstract

This invention discloses a structured vital sign data processing method for clinical research, belonging to the field of medical information technology. The method includes the following steps: parsing the metadata description of the clinical research task and extracting research objective features through semantics; outputting a personalized quality control operation instruction set; transforming the personalized quality control operation instruction set into a directed acyclic quality control workflow consisting of at least two quality control nodes with an execution logical order based on data dependencies; generating a quality control decision instruction sequence corresponding to the research task identifier; constructing a structured data view in memory that meets research needs; calculating the strategy deviation value and adjusting the inference logic weights in the quality control requirement mapping knowledge base accordingly. This invention solves the problems of misjudgment, information loss, and biased research conclusions caused by using fixed rules for data cleaning in existing technologies by dynamically constructing quality control strategies based on specific clinical research objectives and forming reversible processing instructions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of medical information technology, and in particular to a method for processing structured vital signs data for clinical research. Background Technology

[0002] With the widespread adoption of hospital information systems and various bedside monitoring devices, massive amounts of vital sign data are generated daily in the clinical environment. This data, including time-series information such as heart rate, blood pressure, blood oxygen, and respiration, is a valuable resource for clinical research, including disease mechanism studies, efficacy evaluation, and prognostic analysis. How to transform this raw, heterogeneous data into high-quality, computable research data has become a significant technical challenge supporting the development of evidence-based medicine.

[0003] Existing clinical research data preprocessing systems typically achieve the structuring of vital sign data by pre-setting a set of general data cleaning and quality control rules. Specifically, these systems set fixed normal value range thresholds for various vital signs based on physiological common sense or domain experience, and automatically remove or mark data that exceeds the thresholds as abnormal; at the same time, they use a unified processing flow to standardize the handling of issues such as missing values ​​and timestamp misalignment.

[0004] In existing technologies, this processing approach based on fixed rules and uniform procedures has significant shortcomings when facing highly complex and personalized clinical realities. First, the core problem lies in the rigidity of data quality monitoring rules. Clinical scenarios are constantly changing; patients' physiological baselines vary depending on age, disease state, and treatment stage (e.g., perioperative, intensive care, rehabilitation). Fixed thresholds cannot adapt to these dynamic changes, leading to misjudgments (false alarms) of true pathological states or missed detections (false negatives) of genuine abnormal signals. Second, the rigidity of the rules directly causes irreversible information loss during data cleaning. The system may filter out vital sign data that truly reflects rare clinical events or special physiological responses (e.g., specific blood pressure fluctuation patterns caused by a certain drug) as "noise," even though this data could be crucial clues for subsequent innovative scientific discoveries. Ultimately, these two problems together lead to the risk of systematic bias and omission of key information in research analysis results. Analyzing a dataset that has been improperly cleaned and has lost some of its true biological signals may cause research conclusions to deviate from reality. For example, it may underestimate the strength of associations for certain warning signals or completely miss new associations, thereby impairing the accuracy and value of clinical research. Summary of the Invention

[0005] This application provides a structured vital sign data processing method for clinical research, which solves the problems of data misjudgment, loss of real signals, and bias in research conclusions caused by using fixed rules to process variable clinical data in the prior art. It achieves adaptive purification and preservation of data according to specific research objectives, thereby improving the basic quality of research data and the credibility of results.

[0006] This application provides a structured vital sign data processing method for clinical research, including: parsing the metadata description of the clinical research task and obtaining research target features containing research design type, core observed variables and analytical intent through semantic extraction; Based on the characteristics of the research objectives, pattern matching is performed in the quality control requirements mapping knowledge base to output a personalized set of quality control operation instructions. Based on data dependencies, the personalized quality control operation instruction set is transformed into a directed acyclic quality control workflow consisting of at least two quality control nodes with an execution logical order. Perform quality control operations on the original vital signs dataset and capture the processing decisions of each quality control node in real time to generate a sequence of quality control decision instructions corresponding to the research task identifier; While keeping the physical state of the original vital signs dataset unchanged, the quality control decision instruction sequence is dynamically mapped to the original vital signs dataset to build a structured data view in memory that meets the research requirements. Obtain research and analysis feedback data based on the structured data view, calculate the strategy deviation value, and adjust the reasoning logic weights in the quality control requirement mapping knowledge base accordingly.

[0007] Furthermore, the steps for obtaining research objective features, including research design type, core observed variables, and analytical intent, through semantic extraction include: The electronic task book of the clinical research task is retrieved, and the research background description is segmented using a pre-trained natural language processing model to identify medical terminology entities. The physiological definitions of the observed core variables are determined based on the identified medical terminology entities; By identifying the keywords of statistical analysis methods in the task book, it can be determined whether the analysis intention focuses on capturing extreme pathological values ​​or on fitting population trends. The medical terminology entities, physiological definitions, analytical intent, and research design types defined by task settings are encapsulated into a structured research objective feature vector.

[0008] Furthermore, the steps for outputting a personalized set of quality control operation instructions include: The research objective feature vector is input into the input end of the quality control requirement mapping knowledge base, which stores the mapping logic between different research scenarios and quality control intensity. When the research objective is to study postoperative intensive care, the system automatically matches the logic for handling high-sensitivity outliers. That is, for vital sign data that exceed the normal threshold, the system will prioritize calling the clinical event log for contextual verification rather than directly removing it. Based on the statistical requirements of the analysis intent, the missing rate tolerance threshold for specific vital sign fields and the timestamp alignment rules for the differences in collection frequency of multi-source heterogeneous devices are calculated and set, thereby synthesizing a complete set of personalized quality control operation instructions.

[0009] Furthermore, the step of transforming the personalized quality control operation instruction set into a directed acyclic quality control workflow consisting of at least two quality control nodes with an execution logical order includes: Analyze the input and output parameters of each instruction in the personalized quality control operation instruction set to determine the precedence and succession relationships between instructions; Configure corresponding computation operators for each instruction and build quality control nodes; Establish directed connections from data format conversion nodes to terminology standardization nodes, and from scope rationality screening nodes to clinical event auxiliary verification nodes; Dynamic parameters derived from the research objective features are loaded for each quality control node, forming a directed acyclic quality control workflow architecture that reflects specific research preferences and guides subsequent data processing.

[0010] Furthermore, the steps for generating a sequence of quality control decision instructions corresponding to the research task identifier include: The original vital signs dataset is read into the memory buffer, and the quality control nodes are activated one by one according to the topological order of the directed acyclic quality control workflow. When a node performs a marking, calibration, or exclusion operation on a data record, it records the operator type, original value, correction value, and the logical basis for triggering the decision. These decision-making information for individual records are aggregated into streaming data and arranged according to timestamp order and record index number, solidified into a traceable quality control decision instruction sequence. This sequence serves as metadata supplement to the original data and forms a unique index association with the research task identifier.

[0011] Furthermore, the step of dynamically mapping the quality control decision instruction sequence to the original vital sign dataset to construct a structured data view in memory that meets the research requirements includes: Upon receiving a data query request for a specific research task, the system simultaneously loads a read-only reference to the original vital signs dataset and the corresponding quality control decision instruction sequence. During data projection operations, filter and correction records in the quality control decision instruction sequence are retrieved in real time. If there is an exclusion decision for a record in the sequence, the data service module will skip that record in the result set. If a correction decision exists, the corrected value is used instead of the original value in the output.

[0012] Furthermore, the step of calculating the strategy deviation value and adjusting the inference logic weights in the quality control requirement mapping knowledge base accordingly includes: The system receives quantitative evaluation metrics from researchers regarding the quality of the structured data view through a feedback interface. These metrics include outlier deletion rate and accuracy in capturing key clinical events. The deviation calculation formula is used to evaluate the degree of fit between the current quality control strategy and research expectations, and the strategy deviation value is calculated. : ; in, The preset learning rate, This represents the expected ideal data distribution characteristic value for this type of research. These are the actual statistical feature values ​​of the current structured data view. The number of observed indicators; according to By determining the positive and negative values ​​and magnitudes, and adjusting the correlation weight coefficients of different feature dimensions in the quality control requirement mapping knowledge base through a backpropagation mechanism, the quality control strategy can be adaptively evolved.

[0013] Furthermore, the formula for calculating and setting the tolerance threshold for the missing rate of a specific vital sign field is as follows: ; In the formula, This is the final calculated tolerance threshold for the missing rate of a specific vital sign data field. To study the importance weighting coefficient, Assess the baseline importance of the field. This is the data continuity requirement coefficient. This is the continuity benchmark requirement value. This is the data source reliability penalty coefficient. This is the data source reliability attenuation factor.

[0014] Furthermore, the quality control node includes a clinical event-assisted verification node, comprising: The quality control node determines whether the numerical deviations in the original vital signs dataset have clinical authenticity by retrieving medication records or surgical operation records that coincide with the collection time of vital signs. If a corresponding clinical intervention record exists, the quality control node outputs a retention instruction and adds a pathological attribute label to the quality control decision instruction sequence. If there is no corresponding clinical event to support the value, and the value exceeds the physiological limit threshold, the quality control node outputs an abnormality marking instruction.

[0015] One or more technical solutions provided in the embodiments of this application have at least the following technical effects or advantages: By dynamically generating and implementing personalized data quality control strategies based on different clinical research goals, data cleaning rules can be flexibly adapted to specific research scenarios and needs. This solves the rigidity problem of traditional one-size-fits-all quality control rules when facing complex and ever-changing clinical situations, and reduces false alarms and missed alarms.

[0016] Furthermore, in constructing the final dataset for analysis, the use of techniques to separate and dynamically map the quality control decision instruction sequence from the original data makes the filtering, correction, or labeling operations based on the original data reversible and traceable. This avoids the permanent deletion of potentially clinically valuable abnormal information caused by cleaning, thus ensuring the integrity of the data.

[0017] Furthermore, in this closed-loop process of dynamic mapping and feedback optimization, due to the establishment of a strategy bias assessment and self-adjustment mechanism based on actual research analysis results, it can continuously learn and optimize its quality control logic, so that the data view prepared for similar studies can continuously approach the most reasonable state, thereby systematically reducing the risk of research bias introduced by improper data preparation and improving the credibility and robustness of research findings. Attached Figure Description

[0018] Figure 1 A flowchart illustrating the structured vital signs data processing method for clinical research provided in this application embodiment. Detailed Implementation

[0019] This application provides a structured vital sign data processing method for clinical research, which solves the problems of data miscleaning, obscuring of real phenomena, and distortion of analysis results caused by the inability of fixed quality control rules to adapt to specific research scenarios in the prior art. By dynamically generating personalized quality control strategies based on research objectives and recording all cleaning operations as reversible instruction sequences, it achieves the effect of accurately purifying data while fully preserving its research value, thereby ensuring the reliability of clinical research findings.

[0020] To better understand the above technical solutions, the following will provide a detailed explanation of the technical solutions in conjunction with the accompanying drawings and specific implementation methods.

[0021] like Figure 1 The diagram shows a flowchart of a structured vital sign data processing method for clinical research provided in this application embodiment. The method includes the following steps: parsing the metadata description of the clinical research task and obtaining research target features containing the research design type, core observation variables, and analysis intent through semantic extraction. Based on the characteristics of the research objectives, pattern matching is performed in the quality control requirements mapping knowledge base to output a set of personalized quality control operation instructions that include outlier handling logic, missing rate tolerance threshold and timestamp alignment rules. Based on data dependencies, the personalized quality control operation instruction set is transformed into a directed acyclic quality control workflow consisting of at least two quality control nodes with an execution logical order. By instantiating the directed acyclic quality control workflow, quality control operations are performed on the original vital signs dataset, and the processing decisions of each quality control node are captured in real time to generate a sequence of quality control decision instructions corresponding to the research task identifier. While keeping the physical state of the original vital signs dataset unchanged, the quality control decision instruction sequence is dynamically mapped to the original vital signs dataset through the data service module, so as to build a structured data view in memory that meets the research requirements. Obtain research and analysis feedback data based on the structured data view, calculate the strategy deviation value, and adjust the reasoning logic weights in the quality control requirement mapping knowledge base accordingly.

[0022] Furthermore, the steps for obtaining research objective features, including research design type, core observed variables, and analytical intent, through semantic extraction include: The electronic task book of the clinical research task is retrieved, and the research background description is segmented using a pre-trained natural language processing model to identify medical terminology entities. The physiological definitions of the observed core variables are determined based on the identified medical terminology entities; By identifying the keywords of statistical analysis methods in the task book, it can be determined whether the analysis intention focuses on capturing extreme pathological values ​​or on fitting population trends. The medical terminology entities, physiological definitions, analytical intent, and research design types defined by task settings are encapsulated into a structured research objective feature vector.

[0023] The system retrieves the electronic task sheet for a study on the association between perioperative body temperature fluctuations and postoperative infection. A pre-trained NLP model performs word segmentation and entity recognition on descriptions in the task sheet such as "monitoring changes in patients' core body temperature during and after surgery, and statistically analyzing the incidence of surgical site infection within 30 days post-surgery," identifying medical terms such as "core body temperature" and "surgical site infection." Based on this, the system determines that the core variable to be observed is "core body temperature." By recognizing phrases in the task sheet such as "using survival analysis and time series models," the system determines that the analytical intent focuses on capturing "body temperature trends" and "event associations." Finally, the system encapsulates the identified entities, standard physiological definitions of variables, trend analysis intent, and the research design type of "prospective cohort" into a structured feature vector for use in downstream processes.

[0024] Furthermore, the steps for outputting a personalized set of quality control operation instructions include: The research objective feature vector is input into the input end of the quality control requirement mapping knowledge base, which stores the mapping logic between different research scenarios and quality control intensity. When the research objective is to study postoperative intensive care, the system automatically matches the logic for handling high-sensitivity outliers. That is, for vital sign data that exceed the normal threshold, the system will prioritize calling the clinical event log for contextual verification rather than directly removing it. Based on the statistical requirements of the analysis intent, the missing rate tolerance threshold for specific vital sign fields and the timestamp alignment rules for the differences in collection frequency of multi-source heterogeneous devices are calculated and set, thereby synthesizing a complete set of personalized quality control operation instructions.

[0025] The system inputs the encapsulated research objective feature vector into a quality control requirement mapping knowledge base. Based on features such as "prospective cohort," "trend correlation," and "infectious complications," the knowledge base determines that this falls under the "postoperative prognosis study" scenario, requiring a high data integrity strategy. Therefore, the system matches a highly sensitive outlier handling logic: for hypothermia outliers, the instruction prioritizes querying concurrent anesthesia records, transfusion records, and other clinical event logs for verification. Simultaneously, based on the high continuity requirements of trend analysis, the system calculates that the missing rate tolerance threshold for the "core body temperature" field should be set relatively low (e.g., 2%). For data that may originate from different devices such as rectal temperature probes and skin temperature patches, the system also generates rules for high-precision timestamp alignment based on the surgery start time. These instructions collectively constitute the personalized quality control operation instruction set for this study. Furthermore, the step of transforming the personalized quality control operation instruction set into a directed acyclic quality control workflow consisting of at least two quality control nodes with an execution logical order includes: Analyze the input and output parameters of each instruction in the personalized quality control operation instruction set to determine the precedence and succession relationships between instructions; Configure corresponding computation operators for each instruction and build quality control nodes; Establish directed connections from data format conversion nodes to terminology standardization nodes, from scope rationality initial screening nodes to clinical event auxiliary verification nodes, to ensure that no logical loops are generated during data flow; Dynamic parameters derived from the research objective features are loaded for each quality control node, forming a directed acyclic quality control workflow architecture that reflects specific research preferences and guides subsequent data processing.

[0026] Upon receiving the personalized quality control operation instruction set, analysis is performed. It is identified that the "Terminology Standardization" instruction must be executed after "Data Format Conversion," while the "Clinical Event Assisted Validation" instruction is only triggered after anomalies are detected in the "Initial Screening for Range Reasonableness." Based on this, specific operators are configured for each instruction, and nodes are constructed. Then, directed connections are established from the format conversion node to the terminology standardization node, and from the initial screening node to the clinical validation node, forming an acyclic pipeline. Finally, specific parameters are derived from research characteristics, such as loading the physiological limit threshold range of "Core Body Temperature" into the initial screening node, and loading the list of surgical and medication codes related to "Postoperative Infection Diagnosis" into the clinical validation node, thus completing a customized directed acyclic quality control workflow architecture. Furthermore, the steps for generating a sequence of quality control decision instructions corresponding to the research task identifier include: The original vital signs dataset is read into the memory buffer, and the quality control nodes are activated one by one according to the topological order of the directed acyclic quality control workflow. When a node performs a marking, calibration, or exclusion operation on a data record, it records the operator type, original value, correction value, and the logical basis for triggering the decision. These decision-making information for individual records are aggregated into streaming data and arranged according to timestamp order and record index number, solidified into a traceable quality control decision instruction sequence. This sequence serves as metadata supplement to the original data and forms a unique index association with the research task identifier.

[0027] During the quality control execution phase, a large volume of raw body temperature data is read into memory, and the aforementioned workflow is instantiated. The workflow activates nodes in topological order: the format conversion node converts the raw millivolt signals to degrees Celsius; the terminology standardization node unifies "Temp" and "body temperature" as "core body temperature"; the initial screening node identifies a hypothermic record of 31°C, triggering the clinical validation node; this node queries related logs but finds no clear explanation such as massive intravenous infusion, and therefore outputs an "exclusion" decision. The entire process records the operation type, raw value (31°C), decision result (exclusion), and judgment basis ("no matching supporting clinical event") for each node. Finally, all record-level decisions are sorted by timestamp and solidified into a traceable quality control decision instruction sequence uniquely bound to the study task ID.

[0028] Furthermore, the step of dynamically mapping the quality control decision instruction sequence to the original vital sign dataset to construct a structured data view in memory that meets the research requirements includes: Upon receiving a data query request for a specific research task, the system simultaneously loads a read-only reference to the original vital signs dataset and the corresponding quality control decision instruction sequence. When performing data projection operations, the data service module retrieves filtering and correction records from the quality control decision instruction sequence in real time. If there is an exclusion decision for a record in the sequence, the data service module will skip that record in the result set. If a correction decision exists, the corrected value will be used instead of the original value in the output. This process does not involve rewriting the original physical files on the storage device, thus ensuring that the same original data can support the generation of multiple logically mutually exclusive structured data views.

[0029] Furthermore, the step of calculating the strategy deviation value and adjusting the inference logic weights in the quality control requirement mapping knowledge base accordingly includes: The system receives quantitative evaluation metrics from researchers regarding the quality of the structured data view through a feedback interface. These metrics include outlier deletion rate and accuracy in capturing key clinical events. The deviation calculation formula is used to evaluate the degree of fit between the current quality control strategy and research expectations, and the strategy deviation value is calculated. : ; in, The preset learning rate, This represents the expected ideal data distribution characteristic value for this type of research. These are the actual statistical feature values ​​of the current structured data view. The number of observed indicators; according to By determining the positive and negative values ​​and magnitudes, and adjusting the correlation weight coefficients of different feature dimensions in the quality control requirement mapping knowledge base through a backpropagation mechanism, the quality control strategy can be adaptively evolved.

[0030] After analyzing the generated structured data view, an evaluation was submitted through the system feedback interface, indicating that "some genuine early postoperative hypothermia was suspected of being misjudged as abnormal and thus removed." This feedback was received and quantified as the "outlier deletion rate" metric. Subsequently, the strategy deviation value was calculated. The formula is .in, The ideal detection rate of hypothermia events is expected for the "postoperative prognosis study" type. These are the actual statistical values ​​for this view. Calculated... A positive value indicates that the current strategy is too strict. Based on this, the system uses a backpropagation mechanism to reduce the association weight between the "postoperative study" and "high-sensitivity rejection" features in the quality control requirements mapping knowledge base. This will make the system more inclined to retain and mark outliers rather than directly reject them when generating strategies for similar studies in the future.

[0031] Furthermore, the formula for calculating and setting the tolerance threshold for the missing rate of a specific vital sign field is as follows: ; In the formula, The missing data tolerance threshold, calculated for a specific vital sign data field, is a dimensionless percentage value. For example, 0.05 represents a tolerance of 5% missing data. The importance weighting coefficient, used in this study, is a positive real number dynamically assigned based on the characteristics of the research objective. It quantifies the importance of the vital sign field currently being calculated as a core observed variable or key outcome indicator within the target research question. The higher the importance, the greater the weighting coefficient. The larger the value, the higher the allowable missing value threshold. The corresponding reduction is made to ensure the integrity of core data. The baseline importance of the field is a pre-defined baseline value, ranging from 0 to 1, based on historical research meta-analysis or domain expert knowledge. It reflects the general importance of this vital sign field in typical clinical research (e.g., in hemodynamic studies, "invasive systolic blood pressure"). The value is usually higher than "body temperature". The data continuity requirement coefficient is a positive real number derived from the research design type and analytical intent. It quantifies the strength of the research's requirements for data continuity and temporal integrity. For example, for studies that require continuous trend analysis or the construction of time series models, A larger value reduces the tolerance threshold. This is to meet the requirement of higher data continuity. The continuity benchmark requirement value is a value that is related to... Similar preset baseline values, ranging from 0 to 1, are used to represent the general level of data continuity requirements for different types of analysis. This is a data source reliability penalty coefficient, a positive real number derived from historical data quality assessments of the device or system collecting this vital sign field (such as device failure rate and transmission interruption frequency). The lower the data source reliability, the lower the penalty coefficient. The larger the value. This is the data source reliability attenuation factor, a value calculated based on data source quality assessment results, ranging from 0 to 1. Higher reliability... The closer the value is to 0, the better for the threshold. The smaller the negative adjustment (penalty), the lower the reliability. The larger the value, the lower the tolerance threshold is by subtracting this term. This reflects a tendency to exercise stricter control over unreliable data sources.

[0032] Furthermore, the quality control node includes a clinical event-assisted verification node, comprising: The quality control node determines whether the numerical deviations in the original vital signs dataset have clinical authenticity by retrieving medication records or surgical operation records that coincide with the collection time of vital signs. If a corresponding clinical intervention record exists, the quality control node outputs a retention instruction and adds a pathological attribute label to the quality control decision instruction sequence. If there is no corresponding clinical event to support the value, and the value exceeds the physiological limit threshold, the quality control node outputs an abnormality marking instruction.

[0033] In the quality control workflow, a clinical event verification node is activated to process a heart rate record marked "tachycardia." This node immediately searches the clinical information system for medication and surgical procedure records within a specific time window (e.g., ±5 minutes) before and after the heart rate acquisition time. The search reveals a clear medication record of "ephedrine intravenous bolus injection" within this time period. Based on this, the node determines that this heart rate deviation has a clear clinical intervention background and is an expected pharmacological response. Therefore, the node outputs a "retain" instruction and adds a "drug effect" pathological attribute label to the quality control decision generated for this data, allowing subsequent analysis to distinguish between physiological, pathological, and pharmacological changes. The above formulas are all dimensionless calculations. The formulas are derived from software simulations based on a large amount of collected data to obtain the most recent real-world results. The preset parameters in the formulas are set by those skilled in the art according to the actual situation.

[0034] The above embodiments can be implemented, in whole or in part, by software, hardware, firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, in the form of a computer program product.

[0035] Those skilled in the art will recognize that the modules and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0036] In addition, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.

[0037] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0038] In conclusion, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A structured vital sign data processing method for clinical research, characterized in that, Includes the following steps: The metadata description of clinical research tasks is analyzed, and the research objective features, including research design type, core observed variables, and analytical intent, are obtained through semantic extraction. Based on the characteristics of the research objectives, pattern matching is performed in the quality control requirements mapping knowledge base to output a personalized set of quality control operation instructions. Based on data dependencies, the personalized quality control operation instruction set is transformed into a directed acyclic quality control workflow consisting of at least two quality control nodes with an execution logical order. Perform quality control operations on the original vital signs dataset and capture the processing decisions of each quality control node in real time to generate a sequence of quality control decision instructions corresponding to the research task identifier; While keeping the physical state of the original vital signs dataset unchanged, the quality control decision instruction sequence is dynamically mapped to the original vital signs dataset to build a structured data view in memory that meets the research requirements. Obtain research and analysis feedback data based on the structured data view, calculate the strategy deviation value, and adjust the reasoning logic weights in the quality control requirement mapping knowledge base accordingly.

2. The structured vital sign data processing method for clinical research as described in claim 1, characterized in that, The steps for obtaining research objective characteristics, including research design type, core observed variables, and analytical intent, through semantic extraction include: The electronic task book of the clinical research task is retrieved, and the research background description is segmented using a pre-trained natural language processing model to identify medical terminology entities. The physiological definitions of the observed core variables are determined based on the identified medical terminology entities; By identifying the keywords of statistical analysis methods in the task book, it can be determined whether the analysis intention focuses on capturing extreme pathological values ​​or on fitting population trends. The medical terminology entities, physiological definitions, analytical intent, and research design types defined by task settings are encapsulated into a structured research objective feature vector.

3. The structured vital sign data processing method for clinical research as described in claim 1, characterized in that, The steps for outputting a personalized set of quality control operation instructions include: The research objective feature vector is input into the input end of the quality control requirement mapping knowledge base, which stores the mapping logic between different research scenarios and quality control intensity. When the research objective is to study postoperative intensive care, the system automatically matches the logic for handling high-sensitivity outliers. That is, for vital sign data that exceed the normal threshold, the system will prioritize calling the clinical event log for contextual verification rather than directly removing it. Based on the statistical requirements of the analysis intent, the missing rate tolerance threshold for specific vital sign fields and the timestamp alignment rules for the differences in collection frequency of multi-source heterogeneous devices are calculated and set, thereby synthesizing a complete set of personalized quality control operation instructions.

4. The structured vital sign data processing method for clinical research as described in claim 1, characterized in that, The steps of transforming the personalized quality control operation instruction set into a directed acyclic quality control workflow consisting of at least two quality control nodes with an execution logical order include: Analyze the input and output parameters of each instruction in the personalized quality control operation instruction set to determine the precedence and succession relationships between instructions; Configure corresponding computation operators for each instruction and build quality control nodes; Establish directed connections from data format conversion nodes to terminology standardization nodes, and from scope rationality screening nodes to clinical event auxiliary verification nodes; Dynamic parameters derived from the research objective features are loaded for each quality control node, forming a directed acyclic quality control workflow architecture that reflects specific research preferences and guides subsequent data processing.

5. The structured vital sign data processing method for clinical research as described in claim 1, characterized in that, The steps for generating a sequence of quality control decision instructions corresponding to the research task identifier include: The original vital signs dataset is read into the memory buffer, and the quality control nodes are activated one by one according to the topological order of the directed acyclic quality control workflow. When a node performs a marking, calibration, or exclusion operation on a data record, it records the operator type, original value, correction value, and the logical basis for triggering the decision. These decision-making information for individual records are aggregated into streaming data and arranged according to timestamp order and record index number, solidified into a traceable quality control decision instruction sequence. This sequence serves as metadata supplement to the original data and forms a unique index association with the research task identifier.

6. The structured vital sign data processing method for clinical research as described in claim 1, characterized in that, The steps of dynamically mapping the quality control decision instruction sequence to the original vital signs dataset to construct a structured data view in memory that meets research needs include: Upon receiving a data query request for a specific research task, the system simultaneously loads a read-only reference to the original vital signs dataset and the corresponding quality control decision instruction sequence. During data projection operations, filter and correction records in the quality control decision instruction sequence are retrieved in real time. If there is an exclusion decision for a record in the sequence, the data service module will skip that record in the result set. If a correction decision exists, the corrected value is used instead of the original value in the output.

7. The structured vital sign data processing method for clinical research as described in claim 1, characterized in that, The steps of calculating the strategy deviation value and adjusting the inference logic weights in the quality control requirement mapping knowledge base accordingly include: The system receives quantitative evaluation metrics from researchers regarding the quality of the structured data view through a feedback interface. These metrics include outlier deletion rate and accuracy in capturing key clinical events. The deviation calculation formula is used to evaluate the degree of fit between the current quality control strategy and research expectations, and the strategy deviation value is calculated. : ; in, The preset learning rate, This represents the expected ideal data distribution characteristic value for this type of research. These are the actual statistical feature values ​​of the current structured data view. The number of observed indicators; according to By determining the positive and negative values ​​and magnitudes, and adjusting the correlation weight coefficients of different feature dimensions in the quality control requirement mapping knowledge base through a backpropagation mechanism, the quality control strategy can be adaptively evolved.

8. The structured vital sign data processing method for clinical research as described in claim 1, characterized in that, The formula for calculating and setting the tolerance threshold for the missing rate of a specific vital sign field is as follows: ; In the formula, This is the final calculated tolerance threshold for the missing rate of a specific vital sign data field. To study the importance weighting coefficient, Assess the baseline importance of the field. For data continuity requirements, This is the continuity benchmark requirement value. This is the data source reliability penalty coefficient. This is the data source reliability attenuation factor.

9. The structured vital sign data processing method for clinical research as described in claim 1, characterized in that, The quality control node includes a clinical event-assisted verification node, which includes: The quality control node determines whether the numerical deviations in the original vital signs dataset have clinical authenticity by retrieving medication records or surgical operation records that coincide with the collection time of vital signs. If a corresponding clinical intervention record exists, the quality control node outputs a retention instruction and adds a pathological attribute label to the quality control decision instruction sequence. If there is no corresponding clinical event to support the value, and the value exceeds the physiological limit threshold, the quality control node outputs an abnormality marking instruction.