Agent-based medical data processing and analysis method, device, equipment and medium

By employing an agent-based medical data processing method, field-level validation, structured parsing, and machine learning training of medical data are achieved, generating visual charts that conform to clinical standards. This addresses the issues of high usage threshold, complex operation, and insufficient accuracy of existing tools, thereby improving the efficiency and reliability of data analysis.

CN122201826APending Publication Date: 2026-06-12ZHENGZHOU UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHENGZHOU UNIV
Filing Date
2026-03-30
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing medical data analysis tools have high barriers to entry, cumbersome operating procedures, insufficient professionalism and accuracy, and unfriendly interaction methods, making it difficult for non-technical personnel to independently complete data analysis, and the reliability of the analysis results is difficult to guarantee.

Method used

We employ an agent-based medical data processing approach, which generates clinically compliant visualization charts through field-level validation, structured parsing, association analysis, and machine learning training, integrating confidence scores and association analysis results.

Benefits of technology

It improves the reliability and efficiency of medical data analysis, lowers the technical threshold, and enables non-technical personnel to grasp key information through charts, providing data support for clinical diagnosis and medical decision-making.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122201826A_ABST
    Figure CN122201826A_ABST
Patent Text Reader

Abstract

The application discloses an agent-based medical data processing and analysis method, device, equipment and medium, relates to the technical field of computers, and comprises the following steps: performing field-level checking on medical detection data, and determining the medical detection data that passes the checking as checked medical data; performing structured analysis on the checked medical data, converting the obtained structured medical data into standardized medical data in a target file format; calling a target analysis template and an association rule mining algorithm to perform association analysis on the standardized medical data to obtain an association analysis result, training a target algorithm based on the standardized medical data, determining a model fitting degree based on the accuracy of a medical analysis prediction conclusion output by a target medical prediction model; and determining a confidence score based on the amount of checked data corresponding to the field-level checking and the model fitting degree, and generating a visual chart based on the association analysis result. The reliability and efficiency of medical data analysis are improved, and the method meets clinical norms.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a method, apparatus, device, and medium for medical data processing and analysis based on intelligent agents. Background Technology

[0002] In the process of digital transformation in the healthcare industry, the processing and analysis of medical data has become a core support for clinical diagnosis optimization, medical resource allocation, and disease trend prediction. With the massive accumulation of medical data such as electronic medical records, laboratory test data, and treatment records, the demand for efficient and professional data analysis tools is becoming increasingly urgent.

[0003] The current application of medical data analysis tools exhibits significant limitations: First, they have a high barrier to entry. These tools generally require users to possess professional programming skills, manually writing code to perform data extraction, cleaning, and modeling, or to master complex software operation procedures. This makes it difficult for many non-technical medical and management personnel in the healthcare industry to independently complete data analysis work. Second, the operation process is cumbersome. The functions of existing tools are relatively fragmented. Data cleaning requires specific software, statistical analysis relies on another system, and visualization requires switching to specialized charting tools. The complete analysis process requires repeated switching between multiple platforms, resulting in low efficiency. Third, they lack professionalism and accuracy. Existing tools lack specialized design for the characteristics of medical data and do not incorporate guidance from medical knowledge. The analysis process is prone to non-compliance with industry standards, making it difficult to guarantee the reliability of the analysis results. Fourth, the interaction methods are unfriendly. Existing tools do not support natural language interaction, requiring users to input requirements according to fixed formats or commands, further increasing the barrier to entry. These problems make it difficult to fully tap the value of massive amounts of medical data and to quickly provide effective support for medical decision-making.

[0004] As can be seen from the above, improving the reliability and efficiency of medical data analysis while adhering to clinical standards is an urgent problem to be solved. Summary of the Invention

[0005] In view of this, the purpose of this invention is to provide a method, apparatus, device, and medium for medical data processing and analysis based on intelligent agents, which can improve the reliability and efficiency of medical data analysis and comply with clinical standards. The specific solution is as follows: Firstly, this application provides a method for medical data processing and analysis based on intelligent agents, including: The system acquires medical testing data from a medical data source, performs field-level validation on the medical testing data, and identifies the validated medical testing data as the validated medical data. The field-level validation includes a first field validation, a second field validation, and a third field validation. The first field validation is a non-empty validation of the patient ID in the medical testing data, the second field validation is a range validation of the medical testing values ​​in the medical testing data, and the third field validation is an enumeration validation of the medical departments in the medical testing data. The verified medical data is parsed in a structured manner based on the different types of the medical data source to obtain structured medical data. The structured medical data is then converted into standardized medical data in the target file format, and the standardized medical data and the corresponding hash value are cached. A pre-defined statistical analysis agent is used to invoke a target analysis template and an association rule mining algorithm to perform association analysis on the standardized medical data to obtain the corresponding association analysis results. A pre-defined machine learning agent is used to train the target algorithm based on the standardized medical data to obtain a target medical prediction model. The model fit is determined based on the accuracy of the medical analysis prediction conclusions output by the target medical prediction model. The target analysis template is an analysis template that includes medical grouping rules, constructed based on the TableOne tool. The target algorithm is an SVM algorithm and / or a random forest algorithm. The confidence score is determined based on the field-level validation and the model fit, and the confidence score and the correlation analysis results are used to generate corresponding visualization charts.

[0006] Optionally, the step of obtaining medical testing data from a medical data source, performing field-level validation on the medical testing data, and determining the validated medical testing data as the validated medical data includes: Retrieve medical test data from CSV files, Excel files, or MySQL databases respectively; If the field corresponding to the patient number in the medical test data is not empty, and the format of the patient number meets the target format specification, then the first field validation is successful. If the medical test value in the medical test data is within the corresponding target test value range, it indicates that the second field verification is successful; the target test value range is the test value range determined based on the medical department corresponding to the medical test value. Based on a preset list of departments in the medical industry, an enumeration value range is set. If the medical department in the medical test data is within the enumeration value range, it indicates that the third field verification has passed. Medical test data that passes the validation of the first field, the second field, and the third field are identified as validated medical data.

[0007] Optionally, the step of performing structured parsing on the verified medical data based on different types of the medical data source to obtain structured medical data includes: If the medical data source is a CSV file or an Excel file, then regular expressions are used to determine the number of patients in the validated medical data, and it is determined whether the number of patients is within the target range. If the number of patients is within the target range, then the target key fields in the verified medical data are extracted to obtain the first structured data; If the medical data source is a MySQL database, the integrity of the foreign key constraints of the MySQL database is verified. If the verification is successful, the pre-compiled SQL statement corresponding to the target key field is executed using an SQL query tool to obtain the second structured data. Structured medical data is constructed based on the first structured data and the second structured data.

[0008] Optionally, the step of converting the structured medical data into standardized medical data in a target file format and caching the standardized medical data and its corresponding hash value includes: The structured medical data is converted into standardized medical data in a columnar storage file format, and corresponding indexes are created for target fields including patient ID and test timestamp; The structured medical data is converted into binary code format to obtain a data byte stream, and the verification log information corresponding to the structured medical data is determined. The verification log information includes the field-level verification time corresponding to the structured medical data, the amount of data that passed the field-level verification, and the reason for the field-level verification failure. Based on the data byte stream and the verification log information, and using the SHA-256 algorithm to determine the hash value corresponding to the standardized medical data, the Redis caching mechanism is used to cache the standardized medical data and the corresponding hash value.

[0009] Optionally, the step of using a preset statistical analysis agent to invoke a target analysis template and an association rule mining algorithm to perform association analysis on the standardized medical data to obtain corresponding association analysis results includes: The standardized medical data is compared and analyzed by using a pre-set statistical analysis agent that invokes the target analysis template to obtain the analysis results; A minimum support threshold for the medical field is set using an association rule mining algorithm, and the rationality of the analysis results is verified based on the minimum support threshold; different medical departments correspond to different minimum support thresholds; If the verification is successful, the analysis result will be determined as the correlation analysis result.

[0010] Optionally, the training process of the target medical prediction model includes: Training samples are constructed based on standardized medical data. The SVM algorithm and / or random forest algorithm are trained using the training samples. During the training process, the target parameters are corrected using the target loss function to obtain the target medical prediction model. The target loss function is a loss function determined based on the cross-entropy loss function, the mean squared error loss function, and the constraint penalty term; the constraint penalty term is a penalty term determined based on a preset medical knowledge base.

[0011] Optionally, the validation based on field-level validation determines the confidence score through the amount of data and the model fit, and generates corresponding visualization charts using the confidence score and the association analysis results, including: The confidence score for field-level validation is determined by the amount of data, the model fit, and the corresponding weights. The confidence score, the medical analysis prediction conclusion, and the association analysis results are used to generate visualization charts that conform to the target medical standards; the visualization charts include bar charts, pie charts, line charts, and box plots.

[0012] Secondly, this application provides a medical data processing and analysis device based on intelligent agents, comprising: The data verification module is used to acquire medical test data from a medical data source, perform field-level verification on the medical test data, and determine the medical test data that passes the verification as verified medical data. The field-level verification includes a first field verification, a second field verification, and a third field verification. The first field verification is a non-empty verification of the patient ID in the medical test data, the second field verification is a range verification of the medical test values ​​in the medical test data, and the third field verification is an enumeration verification of the medical departments in the medical test data. The data caching module is used to perform structured parsing on the verified medical data based on different types of the medical data source to obtain structured medical data, convert the structured medical data into standardized medical data in the target file format, and cache the standardized medical data and the corresponding hash value. The goodness-of-fit determination module is used to train the target algorithm using a preset machine learning agent and based on the standardized medical data to obtain a target medical prediction model, and to determine the goodness-of-fit of the model based on the accuracy of the medical analysis prediction conclusions output by the target medical prediction model; the target analysis template is an analysis template including medical grouping rules built based on the TableOne tool; the target algorithm is an SVM algorithm and / or a random forest algorithm; The chart generation module is used to determine the confidence score based on the amount of validation data corresponding to the field-level validation and the model fit, and to generate corresponding visualization charts using the confidence score and the correlation analysis results.

[0013] Thirdly, this application provides an electronic device, comprising: Memory, used to store computer programs; A processor is used to execute the computer program to implement the aforementioned agent-based medical data processing and analysis method.

[0014] Fourthly, this application provides a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the aforementioned agent-based medical data processing and analysis method.

[0015] This application acquires medical testing data from a medical data source, performs field-level validation on the medical testing data, and determines the validated medical testing data as the validated medical data. The field-level validation includes a first field validation, a second field validation, and a third field validation. The first field validation is a non-empty check for the patient ID in the medical testing data; the second field validation is a range check for the medical testing values ​​in the medical testing data; and the third field validation is an enumeration check for the medical departments in the medical testing data. Based on different types of the medical data source, the validated medical data is structured and parsed to obtain structured medical data. The structured medical data is then converted into standardized medical data in a target file format, and the standardized medical data and its corresponding... The hash value is cached; a preset statistical analysis agent is used to call the target analysis template and association rule mining algorithm to perform association analysis on the standardized medical data to obtain the corresponding association analysis results; a preset machine learning agent is used to train the target algorithm based on the standardized medical data to obtain the target medical prediction model; the model fit is determined based on the accuracy of the medical analysis prediction conclusions output by the target medical prediction model; the target analysis template is an analysis template including medical grouping rules built based on TableOne tool; the target algorithm is SVM algorithm and / or random forest algorithm; the confidence score is determined based on the field-level validation through the data volume and the model fit; the confidence score and the association analysis results are used to generate the corresponding visualization charts.

[0016] As can be seen from the above, this application performs triple field validation on medical testing data, filtering out invalid or abnormal data from the data source. By parsing different medical data sources, unstructured data is transformed into structured data with clear fields and complete associations, and unified into a target file format, improving data storage and retrieval efficiency. A pre-set statistical analysis agent calls TableOne templates to output association analysis results that conform to industry standards. A pre-set machine learning agent is used to train SVM / random forest algorithms to obtain a predictive model adapted to medical scenarios. Simultaneously, the accuracy of the model's prediction results is quantified through model fit. Finally, a confidence score is generated based on the amount of data that passes field-level validation and the model fit. In this way, by integrating the confidence score and association analysis results through visual charts, even non-technical personnel can grasp key information through the charts, providing data support for clinical diagnosis and medical decision-making. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0018] Figure 1 This is a flowchart of a medical data processing and analysis method based on intelligent agents disclosed in this application; Figure 2 This application provides a bar chart illustrating the number of patients in each department; Figure 3 This is a schematic diagram of the structure of a medical data processing and analysis device based on intelligent agents disclosed in this application; Figure 4 This is a structural diagram of an electronic device disclosed in this application. Detailed Implementation

[0019] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0020] Currently, medical data analysis tools require users to possess professional programming skills, manually writing code to perform data extraction, cleaning, and modeling, or mastering complex software operation procedures. This makes it difficult for many non-technical medical and management personnel in the healthcare industry to independently complete data analysis tasks. Furthermore, the complete analysis process requires repeated switching between multiple platforms, resulting in low efficiency. The lack of specialized design tailored to the characteristics of medical data and the absence of guidance from medical knowledge make the analysis process prone to non-compliance with industry standards, compromising the reliability of the results. Therefore, this application provides a medical data processing and analysis method based on intelligent agents. By integrating confidence scores and correlation analysis results through visual charts, even non-technical personnel can grasp key information through these charts, providing data support for clinical diagnosis and medical decision-making.

[0021] See Figure 1 As shown in the figure, this invention discloses a medical data processing and analysis method based on intelligent agents, including: Step S11: Obtain medical testing data from the medical data source, perform field-level validation on the medical testing data, and determine the validated medical testing data as the validated medical data; the field-level validation includes a first field validation, a second field validation, and a third field validation; the first field validation is a non-empty validation of the patient ID in the medical testing data, the second field validation is a range validation of the medical testing values ​​in the medical testing data, and the third field validation is an enumeration validation of the medical departments in the medical testing data.

[0022] In this embodiment, a data adaptation layer enables flexible access to multiple types of medical data sources. Supported data sources include CSV files, Excel files, and MySQL databases. If the medical data source is a CSV or Excel file, the file parsing module is used to parse its file structure and extract data content and metadata. If the medical data source is a MySQL database, a secure connection protocol is used to establish communication with the MySQL database, supporting batch data reading and incremental synchronization. Corresponding medical testing data is obtained from the three types of medical data sources. If the data source connection fails, a backup data source is activated, and corresponding structured error information is returned to quickly locate the problem and ensure smooth operation of subsequent processes. The structured error information includes the error type and data update timeliness identifier. The medical testing data can be the testing data corresponding to a physical examination. Then, field-level validation is performed on the medical testing data. Specifically, the patient number field in the medical testing data is checked for null values ​​to ensure no null records. In one specific implementation, the format of the patient number must conform to the target format specification, which can be the hospital number + a six-digit serial number, such as HOS2025001. Next, it is determined whether the medical test values ​​in the medical test data are within the corresponding target test value range. For example, the systolic blood pressure threshold for cardiology is 90-160 mmHg, and the blood oxygen saturation for respiratory medicine is 95-100%. Data outside this range is considered abnormal and fails the verification. Then, it is determined whether the medical department in the medical test data is within the range of the enumerated values. The range of enumerated values ​​includes common department names such as cardiology, respiratory medicine, and pediatrics.

[0023] In one specific implementation, the field-level validation rules are as follows: { "type":"object"; "properties":{ "Patient ID":{"type":"string","minLength":1,"description":"Non-empty validation, format is hospital number + 6-digit serial number (e.g., HOS2025001)"}; "Test value":{"type":"number","minimum":"{dynamic threshold lower limit}", "maximum":"{dynamic threshold upper limit}","description":"Load the corresponding threshold based on the department field"} Departments: {"type":"string","enum":["Cardiology","Respiratory Medicine","Endocrinology","Obstetrics and Gynecology","Orthopedics","Gastroenterology","Pediatrics","Neurology","Emergency Medicine","Dermatology","Ophthalmology","Otolaryngology","Stomatology","Geriatrics","Intensive Care Unit"]} }; "required":["Patient ID","Department"]; "dependencies":{"Department":{"Test Value":{"minimum":"{lower limit for the department}","maximum":"{upper limit for the department}"}} } Specifically, the process of acquiring medical testing data from a medical data source, performing field-level validation on the medical testing data, and determining the validated medical testing data as validated medical data includes: acquiring medical testing data from CSV files, Excel files, or MySQL databases respectively; if the field corresponding to the patient number in the medical testing data is not empty and the format of the patient number meets the target format specification, then the first field validation is considered successful; if the medical testing value in the medical testing data is within the corresponding target testing value range, then the second field validation is considered successful; the target testing value range is a range of testing values ​​determined based on the medical department corresponding to the medical testing value; an enumerated value range is set based on a preset department list in the medical industry, and if the medical department in the medical testing data is within the enumerated value range, then the third field validation is considered successful; medical testing data that has passed the first field validation, the second field validation, and the third field validation are determined as validated medical data.

[0024] Step S12: Based on the different types of the medical data source, perform structured parsing on the verified medical data to obtain structured medical data, convert the structured medical data into standardized medical data in the target file format, and cache the standardized medical data and the corresponding hash value.

[0025] In this embodiment, structured medical data is determined based on the different types of the medical data source. If the medical data source is a CSV file or an Excel file, a regular expression is used to determine the number of patients in the validated medical data. The regular expression can be: / number of patients[:=]\s (\d+)|Total number of patients[:=]\s (\d+)|Number of cases[:=]\s (\d+)|sample number[:=]\s (\d+) / ; In one specific implementation, if the number of patients is between 1 and 100,000, key fields such as "number of patients" and "total number of patients" in the verified medical data are extracted to obtain the first structured data. Alternatively, data extraction tools can be used to extract key information such as "number of patients, department, and test indicators" to obtain the first structured data. If the number of patients is not between 1 and 100,000, a corresponding warning message is returned, and the authenticity of the data is verified. If the medical data source is a MySQL database, the integrity of the foreign key constraints of the tables in the MySQL database is verified, such as the "Patient Basic Information Table" and the "Test Result Table," which are linked by "Patient ID." If the verification is successful, a pre-compiled statement is used to query the second structured data corresponding to the target key field. In one specific implementation, taking the query of blood pressure data for each department as an example, the corresponding pre-compiled statement is as follows: PREPARE stmt FROM 'SELECT Patient ID, Department, Systolic Blood Pressure, Diastolic Blood Pressure, Test Time FROM Test Data Table WHERE Test Time; BETWEEN ? AND ?'; SET @start_time = '2025-01-01 00:00:00', @end_time = '2025-03-31 23:59:59'; EXECUTE stmt USING @start_time, @end_time.

[0026] Specifically, the step of performing structured parsing on the verified medical data based on different types of the medical data source to obtain structured medical data includes: if the medical data source is a CSV file or an Excel file, then using regular expressions to determine the number of patients in the verified medical data and judging whether the number of patients is within a target range; if the number of patients is within the target range, then extracting the target key fields from the verified medical data to obtain first structured data; if the medical data source is a MySQL database, then verifying the integrity of foreign key constraints in the MySQL database; if the verification passes, then using an SQL query tool to execute a pre-compiled SQL statement corresponding to the target key fields to obtain second structured data; and constructing structured medical data based on the first structured data and the second structured data. It is worth noting that the target range can be adjusted according to actual circumstances.

[0027] Understandably, after obtaining the structured medical data, it is converted into Apache Parquet format to obtain standardized medical data, and metadata information corresponding to the standardized medical data is saved. This metadata information includes the field types, data source, and verification records of the standardized medical data. Then, the hash value corresponding to the standardized medical data is determined using the SHA-256 algorithm or other encryption algorithms. The corresponding calculation formula is: Hash value = SHA256(Parquet data byte stream + verification log string); where the Parquet data byte stream is the data byte stream obtained by converting the standardized medical data into binary code format. After obtaining the hash value, the standardized medical data and the corresponding hash value are cached using the Redis caching mechanism. The default cache validity period is 24 hours, but the validity period can be adjusted through a configuration file. When the database data is updated, the cache is actively invalidated, and the cached content is updated synchronously. Additionally, a corresponding cache key can be set to quickly retrieve cached data. The format of the cache key can be "data source type_patient ID prefix_data timestamp", for example, "MySQL_HOS2025_20250101120000", where the data source type is MySQL / CSV / Excel, the patient ID prefix is ​​the first 5 characters of the patient ID, and the data timestamp is the data verification completion time, accurate to the second.

[0028] Specifically, the step of converting the structured medical data into standardized medical data in a target file format and caching the standardized medical data and its corresponding hash value includes: converting the structured medical data into standardized medical data in a columnar storage file format and establishing corresponding indexes for target fields including patient ID and detection timestamp; converting the structured medical data into binary code format to obtain a data byte stream, and determining the verification log information corresponding to the structured medical data; the verification log information includes the field-level verification time corresponding to the structured medical data, the amount of data that passed the field-level verification, and the reason for the field-level verification failure; determining the hash value corresponding to the standardized medical data based on the data byte stream and the verification log information using the SHA-256 algorithm, and caching the standardized medical data and its corresponding hash value using the Redis caching mechanism.

[0029] Step S13: Using a preset statistical analysis agent, the standardized medical data is analyzed using a target analysis template and an association rule mining algorithm to obtain the corresponding association analysis results. A preset machine learning agent is used to train the target algorithm based on the standardized medical data to obtain a target medical prediction model. The model fit is determined based on the accuracy of the medical analysis prediction conclusions output by the target medical prediction model. The target analysis template is an analysis template containing medical grouping rules built using TableOne. The target algorithm is an SVM algorithm and / or a random forest algorithm.

[0030] In this embodiment, after caching the standardized medical data, a preset statistical analysis agent is used to call the target analysis template to perform comparative analysis on the standardized medical data. During the analysis, the Apriori algorithm (i.e., association rule mining algorithm) can be used to set a minimum support threshold for the medical field. The minimum support threshold can be dynamically adjusted according to the type of department. For example, the support threshold for surgical data and the support threshold for internal medicine chronic disease data can be adapted to their respective business scenarios to accurately discover potential associations between medical data. In one specific implementation, the obtained minimum support thresholds are: Cardiology (chronic diseases) 0.3, Respiratory Medicine (acute diseases) 0.25, Surgery (operations) 0.35, Pediatrics 0.28, and Geriatrics 0.32. The user terminal can fine-tune the thresholds within ±0.05 according to the characteristics of the department's business. After adjustment, the system automatically verifies the rationality of the analysis results.

[0031] Specifically, the step of using a preset statistical analysis agent to invoke a target analysis template and an association rule mining algorithm to perform association analysis on the standardized medical data to obtain corresponding association analysis results includes: using a preset statistical analysis agent to invoke a target analysis template to perform comparative analysis on the standardized medical data to obtain analysis results; using an association rule mining algorithm to set a minimum support threshold in the medical field, and verifying the rationality of the analysis results based on the minimum support threshold; different medical departments correspond to different minimum support thresholds; if the verification is successful, the analysis result is determined as the association analysis result.

[0032] It is understandable that SVM algorithms, random forest algorithms, etc., are used to train medical prediction models, with standardized medical data as training samples, and a target loss function that incorporates clinical guideline constraints is embedded during the training process. The formula for the target loss function is as follows: ; in, The loss function is either the cross-entropy loss function or the mean squared error loss function; the cross-entropy loss function is used for classification tasks, and the mean squared error loss function is used for regression tasks. This is the penalty coefficient, which defaults to 0.8, but can be adjusted according to the actual situation. To constrain the penalty term, a value of 1 is assigned if the prediction result violates medical common sense, and 0 otherwise. The target parameters are corrected using a target loss function. During the correction process, the prediction results are checked using a medical knowledge base. If contradictions exist, such as a higher incidence of hypertension in individuals under 30 years old compared to those over 50, model training is paused. Then, the weights of contradictory features are reduced, for example, the weight of the "age" feature is multiplied by 0.5, and other related features by 0.8. Next, the model parameters are reinitialized, and training is performed using the aforementioned training samples until no contradictory prediction results are found within the target number of rounds. The medical knowledge base includes rule IDs, common sense descriptions, and logical expressions. For example, rule ID: R1, common sense description: age is positively correlated with the incidence of hypertension, logical expression: for every 10-year increase in age, the incidence of hypertension increases by 8%-12%; rule ID: R2, common sense description: normal blood oxygen saturation range is 95%-100%, logical expression: blood oxygen saturation ∈ [95, 100]. Additionally, the Gini coefficient of the random forest algorithm can be used to determine feature importance scores. If the feature importance score is lower than the target threshold, it is automatically filtered.

[0033] Specifically, the training process of the target medical prediction model includes: constructing training samples based on standardized medical data, using the training samples to train the SVM algorithm and / or random forest algorithm, and using the target loss function to correct the target parameters during the training process to obtain the target medical prediction model; wherein, the target loss function is a loss function determined based on the cross-entropy loss function, the mean squared error loss function, and the constraint penalty term; the constraint penalty term is a penalty term determined based on a preset medical knowledge base.

[0034] In this embodiment, after obtaining the target medical prediction model, the model fit is determined based on the accuracy of the medical analysis prediction conclusions in the prediction results output by the target medical prediction model; wherein, if the target medical prediction model is a classification model, the accuracy is the ratio of the number of correct predictions to the total number of predictions; if the target medical prediction model is a regression model, the accuracy adopts the regression task... Indicators, rounded to two decimal places.

[0035] Step S14: Based on the field-level validation, determine the confidence score through the amount of data and the model fit, and generate corresponding visualization charts using the confidence score and the association analysis results.

[0036] In this embodiment, after obtaining the model fit, a confidence score is determined based on the model fit. The formula for the confidence score is: Confidence Score = (Data Integrity × 0.4 + Validation Pass Rate × 0.3 + Model Fit × 0.3) × 100; where, Data Integrity = Validated Data Volume / Original Data Volume; Validation Pass Rate = Validated Data Volume / Original Data Volume; The confidence score can be rounded to one decimal place. A visualization chart conforming to the target medical guidelines is generated using the confidence score, the medical analysis prediction conclusion, and the correlation analysis results. For outliers in the visualization, the interquartile range is used, and the outlier range is... ;in, It is the 25th percentile; It is the 75th percentile; In addition, for indicators such as blood pressure, blood sugar, and blood lipids that have clear industry standards, the industry standard thresholds are directly adopted. For example, abnormal systolic blood pressure values ​​are >160 mmHg or <90 mmHg, and abnormal fasting blood glucose values ​​are >7.0 mmol / L or <3.9 mmol / L. Figure 2 The bar chart showing the number of patients in each department provided in this embodiment shows that the cardiology department (4 patients) and the endocrinology department (4 patients) have the most patients, while the number of patients in other departments is concentrated in the range of 2-3 patients. Resource allocation recommendation: It is recommended to give priority to ensuring the supply of medical resources to the cardiology and endocrinology departments, while maintaining a balanced allocation to other departments.

[0037] Specifically, the confidence score is determined based on the data volume and model fit of the field-level validation, and the corresponding visualization charts are generated using the confidence score and the association analysis results. This includes: determining the confidence score based on the data volume, model fit, and corresponding weights of the field-level validation; generating visualization charts conforming to the target medical standards using the confidence score, the medical analysis prediction conclusions, and the association analysis results; the visualization charts include bar charts, pie charts, line charts, and box plots. The box plots automatically label the outlier ranges of blood pressure data for each department, such as the outlier range of systolic blood pressure >160 mmHg or <90 mmHg in the cardiology department, helping users quickly identify abnormal data.

[0038] As can be seen from the above, this application performs triple field validation on medical testing data, filtering out invalid or abnormal data from the data source. By parsing different medical data sources, unstructured data is transformed into structured data with clear fields and complete associations, and unified into a target file format, improving data storage and retrieval efficiency. A pre-set statistical analysis agent calls TableOne templates to output association analysis results that conform to industry standards. A pre-set machine learning agent is used to train SVM / random forest algorithms to obtain a predictive model adapted to medical scenarios. Simultaneously, the accuracy of the model's prediction results is quantified through model fit. Finally, a confidence score is generated based on the amount of data that passes field-level validation and the model fit. In this way, by integrating the confidence score and association analysis results through visual charts, even non-technical personnel can grasp key information through the charts, providing data support for clinical diagnosis and medical decision-making.

[0039] Accordingly, see Figure 3 As shown, this application also provides a medical data processing and analysis device based on intelligent agents, including: The data verification module 11 is used to acquire medical test data from a medical data source, perform field-level verification on the medical test data, and determine the medical test data that passes the verification as the verified medical data. The field-level verification includes a first field verification, a second field verification, and a third field verification. The first field verification is a non-empty verification of the patient number in the medical test data, the second field verification is a range verification of the medical test values ​​in the medical test data, and the third field verification is an enumeration verification of the medical departments in the medical test data. The data caching module 12 is used to perform structured parsing on the verified medical data based on different types of the medical data source to obtain structured medical data, convert the structured medical data into standardized medical data in the target file format, and cache the standardized medical data and the corresponding hash value. The goodness-of-fit determination module 13 is used to train the target algorithm using a preset machine learning agent and based on the standardized medical data to obtain a target medical prediction model, and to determine the goodness of fit of the model based on the accuracy of the medical analysis prediction conclusions output by the target medical prediction model; the target analysis template is an analysis template including medical grouping rules constructed based on the TableOne tool; the target algorithm is an SVM algorithm and / or a random forest algorithm. The chart generation module 14 is used to determine the confidence score based on the amount of validation data corresponding to the field-level validation and the model fit, and to generate a corresponding visualization chart using the confidence score and the correlation analysis results.

[0040] In some specific embodiments, the data verification module 11 may specifically include: The detection data acquisition unit is used to acquire medical detection data from CSV files, Excel files, or MySQL databases, respectively. The first field verification unit is used to indicate that the first field verification is passed if the field corresponding to the patient number in the medical test data is not empty and the format of the patient number meets the target format specification. The second field verification unit is used to indicate that the second field verification is passed if the medical test value in the medical test data is within the corresponding target test value range; the target test value range is the test value range determined based on the medical department corresponding to the medical test value. The third field verification unit is used to set an enumeration value range based on a preset list of departments in the medical industry. If the medical department in the medical test data is within the enumeration value range, it indicates that the third field verification has passed. The verified data determination unit is used to determine medical test data that have passed the verification of the first field, the second field, and the third field as verified medical data.

[0041] In some specific embodiments, the data caching module 12 may specifically include: The patient number determination unit is used to determine the number of patients in the validated medical data using regular expressions if the medical data source is a CSV file or an Excel file, and to determine whether the number of patients is within the target number range. The field extraction unit is used to extract the target key fields from the verified medical data if the number of patients is within the target number range, so as to obtain the first structured data. The foreign key verification unit is used to verify the integrity of the foreign key constraints of the MySQL database if the medical data source is a MySQL database. If the verification is successful, the pre-compiled SQL statement corresponding to the target key field is executed using an SQL query tool to obtain the second structured data. A structured data construction unit is used to construct structured medical data based on the first structured data and the second structured data.

[0042] In some specific embodiments, the data caching module 12 may specifically include: The index building unit is used to convert the structured medical data into standardized medical data in a columnar storage file format, and to build corresponding indexes for target fields including patient number and test timestamp; The log information determination unit is used to convert the structured medical data into binary code format to obtain a data byte stream and determine the verification log information corresponding to the structured medical data; the verification log information includes the field-level verification time corresponding to the structured medical data, the amount of data that passed the field-level verification, and the reason for the field-level verification failure; The hash value caching unit is used to determine the hash value corresponding to the standardized medical data based on the data byte stream and the verification log information and using the SHA-256 algorithm, and to cache the standardized medical data and the corresponding hash value using the Redis caching mechanism.

[0043] In some specific embodiments, the fitting degree determination module 13 may specifically include: The data analysis unit is used to use a preset statistical analysis agent to call the target analysis template to perform comparative analysis on the standardized medical data in order to obtain analysis results; The analysis result verification unit is used to set a minimum support threshold in the medical field using an association rule mining algorithm, and to verify the rationality of the analysis results based on the minimum support threshold; different medical departments correspond to different minimum support thresholds; The association analysis result determination unit is used to determine the analysis result as the association analysis result if the verification is successful.

[0044] In some specific embodiments, the fitting degree determination module 13 may specifically include: The target parameter correction unit is used to construct training samples based on standardized medical data, use the training samples to train the SVM algorithm and / or the random forest algorithm, and use the target loss function to correct the target parameters during the training process to obtain the target medical prediction model.

[0045] In some specific embodiments, the chart generation module 14 may specifically include: The confidence score determination unit is used to determine the confidence score based on the amount of verification data corresponding to the field-level verification, the model fit, and the corresponding weights. The chart generation unit is used to generate visual charts that conform to the target medical standards using the confidence score, the medical analysis prediction conclusion, and the correlation analysis results; the visual charts include bar charts, pie charts, line charts, and box plots.

[0046] Furthermore, embodiments of this application also disclose an electronic device, Figure 4 This is a structural diagram of an electronic device 20 according to an exemplary embodiment. The content of the diagram should not be construed as limiting the scope of this application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input / output interface 25, and a communication bus 26. The memory 22 stores a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the agent-based medical data processing and analysis method disclosed in any of the foregoing embodiments. Furthermore, the electronic device 20 in this embodiment may specifically be an electronic computer.

[0047] In this embodiment, the power supply 23 is used to provide operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be any communication protocol applicable to the technical solution of this application, and is not specifically limited here; the input / output interface 25 is used to acquire external input data or output data to the outside world, and its specific interface type can be selected according to specific application needs, and is not specifically limited here.

[0048] In addition, the memory 22, as a carrier for resource storage, can be a read-only memory, random access memory, disk or optical disk, etc. The resources stored thereon can include operating system 221, computer program 222, etc., and the storage method can be temporary storage or permanent storage.

[0049] The operating system 221 is used to manage and control the various hardware devices on the electronic device 20 and the computer program 222, which may be Windows Server, Netware, Unix, Linux, etc. In addition to including computer programs capable of performing the agent-based medical data processing and analysis method executed by the electronic device 20 as disclosed in any of the foregoing embodiments, the computer program 222 may further include computer programs capable of performing other specific tasks.

[0050] Furthermore, this application also discloses a computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, it implements the aforementioned agent-based medical data processing and analysis method. Specific steps of this method can be found in the corresponding content disclosed in the foregoing embodiments, and will not be repeated here.

[0051] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.

[0052] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0053] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented directly by hardware, a software module executed by a processor, or a combination of both. The software module can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.

[0054] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0055] The technical solutions provided in this application have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the methods and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A method for medical data processing and analysis based on intelligent agents, characterized in that, include: Obtain medical testing data from a medical data source, perform field-level validation on the medical testing data, and determine the validated medical testing data as the validated medical data. The field-level validation includes a first field validation, a second field validation, and a third field validation; the first field validation is a non-empty validation of the patient ID in the medical test data, the second field validation is a range validation of the medical test values ​​in the medical test data, and the third field validation is an enumeration validation of the medical departments in the medical test data. The verified medical data is parsed in a structured manner based on the different types of the medical data source to obtain structured medical data. The structured medical data is then converted into standardized medical data in the target file format, and the standardized medical data and the corresponding hash value are cached. A pre-defined statistical analysis agent is used to invoke a target analysis template and an association rule mining algorithm to perform association analysis on the standardized medical data to obtain the corresponding association analysis results. A pre-defined machine learning agent is used to train the target algorithm based on the standardized medical data to obtain a target medical prediction model. The model fit is determined based on the accuracy of the medical analysis prediction conclusions output by the target medical prediction model. The target analysis template is an analysis template that includes medical grouping rules, constructed based on the TableOne tool. The target algorithm is an SVM algorithm and / or a random forest algorithm. The confidence score is determined based on the field-level validation and the model fit, and the confidence score and the correlation analysis results are used to generate corresponding visualization charts.

2. The medical data processing and analysis method based on intelligent agents according to claim 1, characterized in that, The process of acquiring medical testing data from a medical data source, performing field-level validation on the medical testing data, and determining the validated medical testing data as the validated medical data includes: Retrieve medical test data from CSV files, Excel files, or MySQL databases respectively; If the field corresponding to the patient number in the medical test data is not empty, and the format of the patient number meets the target format specification, then the first field validation is successful. If the medical test value in the medical test data is within the corresponding target test value range, it indicates that the second field verification is successful; the target test value range is the test value range determined based on the medical department corresponding to the medical test value. Based on a preset list of departments in the medical industry, an enumeration value range is set. If the medical department in the medical test data is within the enumeration value range, it indicates that the third field verification has passed. Medical test data that passes the validation of the first field, the second field, and the third field are identified as validated medical data.

3. The medical data processing and analysis method based on intelligent agents according to claim 2, characterized in that, The step of performing structured parsing on the verified medical data based on different types of the medical data source to obtain structured medical data includes: If the medical data source is a CSV file or an Excel file, then regular expressions are used to determine the number of patients in the validated medical data, and it is determined whether the number of patients is within the target range. If the number of patients is within the target range, then the target key fields in the verified medical data are extracted to obtain the first structured data; If the medical data source is a MySQL database, the integrity of the foreign key constraints of the MySQL database is verified. If the verification is successful, the pre-compiled SQL statement corresponding to the target key field is executed using an SQL query tool to obtain the second structured data. Structured medical data is constructed based on the first structured data and the second structured data.

4. The medical data processing and analysis method based on intelligent agents according to claim 1, characterized in that, The step of converting the structured medical data into standardized medical data in a target file format and caching the standardized medical data and its corresponding hash value includes: The structured medical data is converted into standardized medical data in a columnar storage file format, and corresponding indexes are created for target fields including patient ID and test timestamp; The structured medical data is converted into binary code format to obtain a data byte stream, and the verification log information corresponding to the structured medical data is determined. The verification log information includes the field-level verification time corresponding to the structured medical data, the amount of data that passed the field-level verification, and the reason for the field-level verification failure. Based on the data byte stream and the verification log information, and using the SHA-256 algorithm to determine the hash value corresponding to the standardized medical data, the Redis caching mechanism is used to cache the standardized medical data and the corresponding hash value.

5. The medical data processing and analysis method based on intelligent agents according to claim 1, characterized in that, The step of using a pre-set statistical analysis agent to invoke a target analysis template and an association rule mining algorithm to perform association analysis on the standardized medical data, in order to obtain the corresponding association analysis results, includes: The standardized medical data is compared and analyzed by using a pre-set statistical analysis agent that invokes the target analysis template to obtain the analysis results; A minimum support threshold for the medical field is set using an association rule mining algorithm, and the rationality of the analysis results is verified based on the minimum support threshold; different medical departments correspond to different minimum support thresholds; If the verification is successful, the analysis result will be determined as the correlation analysis result.

6. The medical data processing and analysis method based on intelligent agents according to claim 1, characterized in that, The training process of the target medical prediction model includes: Training samples are constructed based on standardized medical data. The SVM algorithm and / or random forest algorithm are trained using the training samples. During the training process, the target parameters are corrected using the target loss function to obtain the target medical prediction model. The target loss function is a loss function determined based on the cross-entropy loss function, the mean squared error loss function, and the constraint penalty term; the constraint penalty term is a penalty term determined based on a preset medical knowledge base.

7. The method for processing and analyzing medical data based on intelligent agents according to any one of claims 1 to 6, characterized in that, The field-level validation determines the confidence score based on the data volume and the model fit, and generates corresponding visualization charts using the confidence score and the association analysis results, including: The confidence score for field-level validation is determined by the amount of data, the model fit, and the corresponding weights. The confidence score, the medical analysis prediction conclusion, and the association analysis results are used to generate visualization charts that conform to the target medical standards; the visualization charts include bar charts, pie charts, line charts, and box plots.

8. A medical data processing and analysis device based on intelligent agents, characterized in that, include: The data verification module is used to acquire medical test data from a medical data source, perform field-level verification on the medical test data, and determine the medical test data that passes the verification as verified medical data. The field-level verification includes a first field verification, a second field verification, and a third field verification. The first field verification is a non-empty verification of the patient ID in the medical test data, the second field verification is a range verification of the medical test values ​​in the medical test data, and the third field verification is an enumeration verification of the medical departments in the medical test data. The data caching module is used to perform structured parsing on the verified medical data based on different types of the medical data source to obtain structured medical data, convert the structured medical data into standardized medical data in the target file format, and cache the standardized medical data and the corresponding hash value. The goodness-of-fit determination module is used to train the target algorithm using a preset machine learning agent and based on the standardized medical data to obtain a target medical prediction model, and to determine the goodness-of-fit of the model based on the accuracy of the medical analysis prediction conclusions output by the target medical prediction model; the target analysis template is an analysis template including medical grouping rules built based on the TableOne tool; the target algorithm is an SVM algorithm and / or a random forest algorithm; The chart generation module is used to determine the confidence score based on the amount of validation data corresponding to the field-level validation and the model fit, and to generate corresponding visualization charts using the confidence score and the correlation analysis results.

9. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the agent-based medical data processing and analysis method as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, Used to store a computer program, wherein the computer program, when executed by a processor, implements the agent-based medical data processing and analysis method as described in any one of claims 1 to 7.