A method and system for predicting the risk of respiratory hospital infections

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a high-dimensional hybrid feature space and an improved weighted Naive Bayes classifier, and combining it with internal and external environmental data for dynamic calibration, the shortcomings of existing technologies in predicting respiratory hospital infection risks have been addressed. This has enabled accurate and dynamic risk prediction and source tracing analysis, thereby improving prevention and control efficiency.

CN122245779APending Publication Date: 2026-06-19THE 921ST HOSPITAL OF THE CHINESE PEOPLES LIBERATION ARMY JOINT LOGISTICS SUPPORT FORCE

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: THE 921ST HOSPITAL OF THE CHINESE PEOPLES LIBERATION ARMY JOINT LOGISTICS SUPPORT FORCE
Filing Date: 2026-03-27
Publication Date: 2026-06-19

Application Information

Patent Timeline

27 Mar 2026

Application

19 Jun 2026

Publication

CN122245779A

IPC: G16H50/30; G16H50/70; G06F18/2415; G06N3/0455; G06N3/126; G06F18/2431; G06F18/15; G06F18/27

AI Tagging

Application Domain

Medical data mining Health-index calculation

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing respiratory hospital infection risk prediction technologies have technical shortcomings in feature extraction, model building, environmental fusion, result calibration, and early warning and tracing, which cannot meet the needs of hospitals for precise and dynamic prevention and control, and the prediction accuracy and robustness are insufficient.

Method used

By constructing a high-dimensional hybrid feature space of temporal dynamic features and static features, an improved weighted Naive Bayes classifier and a deep neural network model are used to extract patient diagnosis and treatment event sequences. Combined with in-hospital and out-of-hospital environmental monitoring data for dynamic calibration, high-precision risk warning information is generated, and transmission clusters are identified through infection source tracing analysis.

Benefits of technology

It enables accurate and dynamic prediction and tiered early warning of respiratory hospital infection risks, improves the accuracy and robustness of prediction, assists hospitals in quickly locating potential transmission routes, and enhances the intelligence and scientific level of prevention and control work.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122245779A_ABST

Patent Text Reader

Abstract

This invention relates to a method and system for predicting the risk of respiratory hospital-acquired infections, comprising the following steps: Data acquisition and preprocessing step: acquiring electronic medical record data from a target hospital over a historical period, wherein the electronic medical record data includes at least patient demographic data, medical record data, laboratory test data, and imaging test data; preprocessing the electronic medical record data to construct a multi-dimensional original feature set containing basic patient information, treatment process information, and clinical laboratory information; Temporal dynamic feature extraction step: based on the original feature set, constructing a temporal sequence of patient treatment events, and using a deep neural network model to extract the implicit temporal dynamic features of each patient, wherein the temporal dynamic features characterize the changing trend of the patient's respiratory infection risk over time; this invention enables more accurate prediction of the risk of respiratory hospital-acquired infections.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of prediction methods, specifically to a method and system for predicting the risk of respiratory hospital infections. Background Technology

[0002] In the field of hospital infection control, respiratory hospital-acquired infections are one of the main types of nosocomial infections. Their prevention and control are crucial for ensuring medical quality and patient safety. However, current risk prediction technologies for respiratory hospital-acquired infections have many shortcomings and deficiencies, making it difficult to meet the precise and dynamic prevention and control needs of hospitals. Existing technologies, when predicting infection risk, mostly extract static features from patients' electronic medical records, ignoring the temporal dynamic features of the patient's treatment process. This fails to accurately characterize the changing trend of respiratory infection risk over time, and the processing of treatment event sequences lacks scientific time window optimization and feature extraction methods, making it difficult to focus on key treatment events strongly correlated with infection. When building prediction models, traditional classifiers do not specifically optimize feature weights, or use a single Gaussian distribution assumption for the conditional probability estimation of continuous features, which does not match the actual data distribution, resulting in insufficient model prediction accuracy and generalization ability. Furthermore, existing methods generally ignore the impact of internal and external environmental factors on respiratory infection risk, failing to incorporate environmental monitoring data such as internal hospital temperature and humidity and pollutant concentrations, as well as external meteorological data and community influenza-like illness data into the prediction system, relying solely on individual patient characteristics. The predicted results of the symptoms deviate significantly from the actual infection patterns. In addition, the initial prediction model lacks a sound dynamic calibration mechanism and does not scientifically evaluate the reliability of the model's prediction results. The single model is prone to prediction distortion due to its own bias, and it does not combine seasonal background risk probabilities for fusion correction, further reducing the robustness of the prediction results. In the risk warning and source tracing stages, existing technologies mostly use fixed thresholds to classify risk levels, which cannot match the actual distribution patterns of historical infection events in hospitals. The differentiated prevention and control effect of graded warnings is not good, and there is a lack of effective infection source tracing analysis methods after the warning is triggered. It is difficult to quantify the similarity of the diagnosis and treatment process among patients, locate potential infection transmission clusters and transmission routes, and make it impossible for hospital infection control departments to quickly identify key prevention and control points and take targeted intervention measures. The efficiency of prevention and control resource investment is low, and it is difficult to effectively block the nosocomial transmission of respiratory infections. Overall, existing respiratory hospital infection risk prediction technologies have shortcomings in feature extraction, model building, environmental fusion, result calibration, early warning and source tracing, etc. They lack multi-dimensional data integration capabilities and have poor accuracy, dynamism and scientific rigor in prediction, which cannot provide efficient and reliable technical support for hospital infection prevention and control. Therefore, a respiratory hospital infection risk prediction method and system is proposed. Summary of the Invention

[0003] To address the shortcomings of existing technologies, this invention provides a method for predicting the risk of respiratory hospital-acquired infections, comprising the following steps: Data acquisition and preprocessing steps: Acquire electronic medical record data of the target hospital within a historical time period. The electronic medical record data includes at least patient demographic data, medical record data, laboratory test data, and imaging test data. Preprocess the electronic medical record data to construct a multi-dimensional raw feature set containing basic patient information, medical process information, and clinical test information. Temporal dynamic feature extraction steps: Based on the original feature set, a temporal sequence of patient diagnosis and treatment events is constructed, and a deep neural network model is used to extract the implicit temporal dynamic features of each patient. The temporal dynamic features are used to characterize the changing trend of the patient's respiratory infection risk over time. Hybrid feature space construction steps: merge the static features in the original feature set with the temporal dynamic features to construct a high-dimensional hybrid feature space; Initial risk prediction model construction steps: In the high-dimensional mixed feature space, an initial risk prediction model is constructed based on an improved weighted Naive Bayes classifier. The improved weighted Naive Bayes classifier adjusts the contribution of different features to the classification results by introducing feature weight factors. The steps for constructing and calibrating the dynamic environmental correction coefficient are as follows: collect internal environmental monitoring data and external public health monitoring data of the hospital, merge them to construct a dynamic environmental risk correction coefficient, and dynamically calibrate the output of the initial risk prediction model based on the correction coefficient to generate the final risk prediction value. Risk warning generation steps: Based on the final risk prediction value, the risk level of respiratory infection is classified for different areas or different patient groups within the hospital, and corresponding risk warning information is generated.

[0004] Furthermore, the construction of the temporal patient diagnosis and treatment event sequence specifically includes: converting each patient's diagnosis code, drug prescription, examination items, and vital sign monitoring records into discrete event tokens in chronological order; dividing the event tokens into multiple subsequences of equal duration through a sliding time window, with each subsequence representing the features of a diagnosis and treatment stage; and inputting the subsequences into a pre-trained Transformer encoder for processing to extract temporal dynamic features. The process for determining the window size of the sliding time window is as follows: First, incubation period data of historical infection cases and hospitalization duration data of hospitalized patients during the same period are collected. The two sets of data are then fitted with two log-normal probability density functions. Next, with the optimization objective of maximizing the integral value of the overlapping region of these two probability density functions on the time axis, a grid search algorithm is used to traverse within a preset candidate window length range. The preset candidate window length range is an interval defined by the minimum and maximum window values set based on clinical experience. For each candidate window length, the overlap integral value of the two probability density functions within the time interval corresponding to that window length is calculated. By comparing the overlap integral values under all candidate window lengths, the candidate window length that maximizes the overlap integral value is determined as the final sliding time window size.

[0005] Furthermore, when processing patient diagnosis and treatment event sequences, the self-attention mechanism of the Transformer encoder assigns an attention score to each event token. The calculation of the attention score is based on the correlation between the event token and the current prediction target. The correlation calculation is based on the dot product operation of the query matrix and the key matrix. Through this attention score, the Transformer encoder can focus on key diagnosis and treatment events that are strongly correlated with the occurrence of respiratory infections. After the self-attention mechanism is computed, the results from multiple attention heads of the multi-head attention mechanism are concatenated and then subjected to a nonlinear transformation through a feedforward neural network layer. To introduce nonlinear transformation capability while preserving the original information, a residual connection structure based on a gating mechanism is introduced after the feedforward neural network layer. The computation process of the residual connection structure based on the gating mechanism is as follows: First, the input vector and output vector of the feedforward neural network layer are concatenated to obtain a merged vector. Then, this merged vector is input into a single-layer neural network, and after linear transformation, it is processed by the sigmoid activation function to obtain a gating parameter. This gating parameter is a vector between 0 and 1, used to control the fusion ratio of the original input information and the transformed information. Specifically, the formula for calculating the gating parameters is as follows: ; Where x represents the input of the feedforward neural network layer, F(x) represents the output of the feedforward neural network layer, and W g and b g For learnable weight matrices and bias terms, Represents the sigmoid activation function. This represents the concatenation operation of vectors; After obtaining the gating parameters, the final output is obtained by weighted summation of the original input and the transformed output according to the gating parameters, specifically: ; in, This represents element-wise multiplication.

[0006] Furthermore, the improved weighted Naive Bayes classifier is constructed as follows: First, the mutual information value between each feature and the occurrence of respiratory infection in the high-dimensional mixed feature space is calculated, and the mutual information value is normalized and used as the initial feature weight; Second, a feature weight optimization process based on a genetic algorithm is introduced, using the prediction accuracy of the classifier as the fitness function to iteratively optimize the initial feature weights to obtain the optimal feature weight vector; Finally, in the likelihood probability calculation of the Naive Bayes classifier, the conditional probability of the feature is multiplied by the corresponding optimal feature weight. The calculation process of the conditional probability of a feature is as follows: For continuous features in a high-dimensional mixed feature space, instead of the traditional Gaussian distribution assumption, a non-parametric method based on kernel density estimation is used to estimate the probability density. In specific processing, for a given continuous feature, the values of all samples belonging to the same category in the training set on that feature are collected; Then, using the value of each sample point as the center, a kernel density estimation model is constructed using the Gaussian kernel function; When it is necessary to calculate the conditional probability density of a feature value for a sample to be predicted under a specific category, it can be done in the following way: The difference between the feature value of the sample to be predicted and the value of that feature for each sample of the same type in the training set is calculated. Each difference is divided by the bandwidth parameter and then fed into the Gaussian kernel function to obtain the kernel function value for each sample. Finally, all kernel function values are summed and divided by the product of the number of samples of the same type and the bandwidth parameter. The specific calculation process is as follows: ; Among them, C k Represents the k-th category, n k Representative category C k The number of samples, x i Representative category C k The feature value of the i-th sample is given by h, where h represents the bandwidth parameter and K represents the Gaussian kernel function.

[0007] Furthermore, the feature weight optimization process based on the genetic algorithm is specifically implemented as follows: the weight of each feature is encoded as a gene on a chromosome, and a population containing multiple weight vectors is initialized. For each individual in the population, i.e., a weight vector, it is applied to a weighted Naive Bayes classifier, and the classification accuracy is calculated on the validation set as the fitness value of that individual. A new generation of population is generated through selection, crossover, and mutation operations. This process is repeated iteratively until a preset number of generations is reached, and the individual with the highest fitness is used as the optimal feature weight vector. When performing the crossover operation, a hybrid crossover strategy based on fitness ratios is used to generate new offspring. The specific calculation process of this strategy is as follows: First, two parent individuals are selected from the current population based on their fitness. Then, two different crossover coefficients are dynamically calculated based on the fitness values of these two parent individuals. Among them, the value of the first crossover coefficient is proportional to the fitness value of the first parent individual, so that the parent individual with higher fitness contributes more information; The value of the second crossover coefficient is inversely proportional to the fitness value of the second parent individual to ensure population diversity; Based on two different crossover coefficients, the two offspring individuals are calculated according to the following procedure: ; ; Wherein, α and β are crossover coefficients obtained by dynamic calculation. The value of α is directly proportional to the fitness value of the parent individual parent1, and the value of β is inversely proportional to the fitness value of the parent individual parent2.

[0008] Furthermore, the process of constructing the dynamic environmental risk correction coefficient includes: collecting in-hospital environmental monitoring data for each area within the target hospital, including at least temperature and humidity, PM2.5 concentration, and CO2 concentration; Simultaneously, obtain out-of-hospital public health surveillance data for the city where the hospital is located. The out-of-hospital public health surveillance data should include at least meteorological data and community influenza-like illness surveillance data. By constructing a spatiotemporal fusion model, the above-mentioned in-hospital environmental monitoring data and external public health monitoring data are integrated to generate a comprehensive environmental risk index that represents the combined impact of external environmental pressure and internal environmental quality. The spatiotemporal fusion model adopts a data fusion method based on tensor decomposition. The specific processing procedure is as follows: First, the multi-indicator time series data collected from various monitoring points inside the hospital, the multi-indicator time series data collected from external meteorological stations, and the time series data collected from community influenza-like illness monitoring points are jointly constructed into a three-dimensional tensor. The three dimensions of the three-dimensional tensor represent the time point, spatial location, and monitoring indicator type, respectively. Then, the CANDECOMP / PARAFAC decomposition method is used to decompose the three-dimensional tensor into the product of three factor matrices. The three-dimensional tensor decomposition process can extract potential common structures from the original multi-source data. Finally, by reconstructing the decomposed factor matrix, missing values in the original data can be filled in, and noise interference introduced by a single data source can be eliminated, thereby obtaining a standardized and more robust comprehensive environmental risk index. After obtaining the comprehensive environmental risk index, it is converted into a dynamic environmental risk correction coefficient that can be multiplied by the initial risk prediction value; The conversion process is achieved through a nonlinear transformation using a Sigmoid function. The specific calculation process is as follows: First, a baseline value for an environmental risk index is set, which is determined based on the average level of historical in-hospital environmental monitoring data and out-of-hospital public health monitoring data. Then, the difference between the comprehensive environmental risk index and the benchmark value is calculated. This difference is multiplied by a rate of change parameter for adjusting the steepness of the curve and then input into a modified form of the Sigmoid function for processing, so that the output value shows a smooth change around the benchmark value. Finally, by controlling the overall range of change of the correction coefficient through a maximum fluctuation amplitude parameter, the comprehensive environmental risk index is mapped to a dynamic environmental risk correction coefficient centered at 1 and fluctuating within a preset range. The specific calculation process is as follows: ; Where, k 环境 I represents the dynamic environmental risk correction coefficient. env The comprehensive environmental risk index is represented by I0, which represents the benchmark value of the environmental risk index. γ represents the rate of change parameter of the steepness of the adjustment curve, which controls the sensitivity of the correction coefficient to changes in environmental risk. δ represents the maximum fluctuation range parameter of the correction coefficient, which limits the upper and lower fluctuation range of the correction coefficient.

[0009] Furthermore, the dynamic calibration of the output of the initial risk prediction model is achieved through the following formula: ; Among them, P 最终 P represents the final risk prediction value. 初始 k represents the risk probability output by the initial risk prediction model. 环境 Represents the dynamic environmental risk correction coefficient; Before performing the above multiplication calibration, a confidence assessment step is included to ensure the reliability of the calibration. The specific calculation process for the confidence assessment step is as follows: First, the calibration error of the initial risk prediction model is calculated on the validation set, which is the deviation between the probability predicted by the model and the actual frequency, specifically measured by the expected calibration error index. At the same time, the information entropy of the output probability of the initial risk prediction model is calculated to measure the uncertainty of the prediction result. The higher the information entropy, the greater the uncertainty of the prediction result. Then, the calibration error and information entropy are normalized and weighted to obtain a comprehensive confidence score between 0 and 1. The higher the comprehensive confidence score, the more reliable the prediction result of the initial risk prediction model. When the comprehensive confidence score is lower than the preset confidence threshold, it indicates that the prediction result of the current initial risk prediction model may have a large deviation. At this time, a model fusion correction process is initiated: the background risk probability output by the seasonal autoregressive moving average model constructed based on historical infection rate data is obtained, and the risk probability output by the initial risk prediction model is weighted and fused with the background risk probability. The fusion weights of the initial risk prediction model and the seasonal autoregressive moving average model are proportional to their respective comprehensive confidence scores, and the sum of the two fusion weights is 1. The weighted fusion result is used as the calibrated initial risk probability, and then substituted into the aforementioned formula for subsequent multiplication calibration. When the overall confidence score is not lower than the preset confidence threshold, the risk probability output by the initial risk prediction model is directly substituted into the calculation formula for dynamic calibration of the output result of the initial risk prediction model as the calibrated initial risk probability, and then the calibrated initial risk probability is the risk probability output by the initial risk prediction model itself.

[0010] Furthermore, the specific process of generating the corresponding risk warning information includes: comparing the final risk prediction value with multiple preset risk thresholds, wherein the risk thresholds are dynamically determined by analyzing the distribution quantiles of historical infection events; When the predicted risk value exceeds the first threshold, a concern-level warning is generated; when it exceeds the second threshold, a warning-level warning is generated. If the second threshold is greater than the first threshold, the hospital infection control system will be activated to automatically push a list of high-risk patients and suggested intervention measures to the terminal device. After a warning-level alert is triggered, an infection tracing analysis module is automatically activated to assist the hospital's infection control department in quickly locating potential transmission routes. The specific calculation process of the infection source tracing analysis module is as follows: First, based on the time series characteristics of high-risk patients, extract the diagnosis and treatment event sequence of each patient within a period of time before the warning time point. This diagnosis and treatment event sequence is a sequence composed of event tokens. Then, the sequence similarity of patients during the treatment process is quantified by calculating the edit distance of the sequence of treatment events between any two patients; The edit distance is calculated by using dynamic programming to determine the minimum number of single-character edit operations required to transform one sequence into another, including insertion, deletion, and replacement. The edit distance is then normalized by dividing the edit distance by the maximum of the lengths of the two sequences to eliminate the impact of differences in sequence length. The formula for calculating the normalized edit distance between two patients is: ; Where D(i,j) represents the normalized edit distance between patient i and patient j, and S i and S j These represent the medical event sequences of two patients, respectively. Levenshtein represents the Levenstein distance between the two sequences. |S i |and|S j | represent the lengths of the two sequences respectively; Based on the calculated normalized edit distance between all patients, its reciprocal is defined as the association strength between patients. A patient similarity network is constructed with patients as nodes and the association strength as the edge weights. In the patient similarity network, the larger the edge weight, the more similar the diagnosis and treatment process of the two patients is, and the higher the possibility of infection association. Finally, the Louvain community detection algorithm was used to iteratively optimize the constructed patient similarity network. By maximizing the modularity index, closely connected node clusters in the network were identified, and these node clusters were identified as possible infection transmission clusters. The information on the identified infection transmission clusters, including the list of patients within the cluster and the similarities between patients within the cluster, will be pushed to the terminal devices of hospital infection control personnel along with the risk warning information.

[0011] A respiratory hospital infection risk prediction system, the system comprising: The data acquisition module is used to acquire electronic medical record data of the target hospital within a historical time period. The electronic medical record data includes at least patient demographic data, medical record data, laboratory test data, and imaging test data. The feature construction module is used to preprocess the electronic medical record data and construct a multi-dimensional original feature set containing basic patient information, diagnosis and treatment process information, and clinical test information. Based on the original feature set, a temporal sequence of patient diagnosis and treatment events is constructed, and a deep neural network model is used to extract the implicit temporal dynamic features of each patient. The temporal dynamic features are used to characterize the changing trend of the patient's respiratory infection risk over time. The static features in the original feature set and the temporal dynamic features are fused to construct a high-dimensional hybrid feature space. The model building module is used to build an initial risk prediction model based on an improved weighted Naive Bayes classifier in the high-dimensional mixed feature space. The improved weighted Naive Bayes classifier adjusts the contribution of different features to the classification results by introducing feature weight factors. The environmental correction module is used to collect internal environmental monitoring data and external public health monitoring data of the hospital, merge them to construct a dynamic environmental risk correction coefficient, and dynamically calibrate the output of the initial risk prediction model based on the correction coefficient to generate the final risk prediction value. The early warning generation module is used to classify the respiratory infection risk level of different areas or different patient groups in the hospital based on the final risk prediction value, and generate corresponding risk early warning information.

[0012] The present invention has the following advantages over the prior art: By integrating multi-dimensional electronic medical record data, including patient demographics, medical records, laboratory and imaging examinations, a high-dimensional hybrid feature space combining temporal dynamic and static features is constructed and extracted. An improved weighted Naive Bayes classifier (using mutual information to determine initial weights, a genetic algorithm to optimize weights, and kernel density estimation to handle continuous features) enhances the model's accuracy in predicting infection risk classification. The Transformer encoder, combined with a residual connection structure using a sliding time window and gating mechanism optimized based on clinical data, accurately focuses on key diagnostic and treatment events strongly correlated with respiratory infections, effectively characterizing the trend of infection risk over time. Simultaneously, by integrating environmental monitoring data such as in-hospital temperature and humidity and PM2.5 concentration, and public health data such as out-of-hospital meteorological and influenza-like illness surveillance, a comprehensive environmental risk index is constructed through tensor decomposition and converted into a dynamic environmental risk correction coefficient. The reliability assessment and model fusion correction process dynamically calibrates the initial prediction results, taking into account the impact of environmental factors on infection risk and improving the robustness and reliability of the prediction results. It can also classify infection risk levels based on dynamically determined risk thresholds and generate different levels of early warning information. After triggering a warning-level alert, a patient similarity network is constructed by calculating the normalized edit distance of the diagnosis and treatment event sequence. Combined with the Louvain community discovery algorithm, infection transmission clusters can be identified. This can assist hospital infection control departments in quickly locating potential transmission routes and automatically push high-risk patient lists, intervention measures, and transmission cluster information. It realizes accurate and dynamic prediction and hierarchical early warning of respiratory hospital infection risk, and completes infection source tracing analysis. It significantly improves the intelligence and scientific level of hospital respiratory infection prevention and control, and helps hospitals take timely and targeted intervention measures to effectively reduce the probability of respiratory infections in hospitals. Attached Figure Description

[0013] Figure 1 This is the overall flowchart of the present invention. Detailed Implementation

[0014] To further illustrate the technical means and effects of the present invention in achieving its intended purpose, the following detailed description of the specific implementation methods, structures, features, and effects of the present invention, in conjunction with the accompanying drawings and preferred embodiments, is provided below.

[0015] like Figure 1 As shown, a method for predicting the risk of respiratory hospital-acquired infections includes the following steps: Data acquisition and preprocessing steps: Acquire electronic medical record data of the target hospital within a historical time period. The electronic medical record data includes at least patient demographic data, medical record data, laboratory test data, and imaging test data. Preprocess the electronic medical record data to construct a multi-dimensional raw feature set containing basic patient information, medical process information, and clinical test information. Temporal dynamic feature extraction steps: Based on the original feature set, a temporal sequence of patient diagnosis and treatment events is constructed, and a deep neural network model is used to extract the implicit temporal dynamic features of each patient. The temporal dynamic features are used to characterize the changing trend of the patient's respiratory infection risk over time. Hybrid feature space construction steps: merge the static features in the original feature set with the temporal dynamic features to construct a high-dimensional hybrid feature space; Initial risk prediction model construction steps: In the high-dimensional mixed feature space, an initial risk prediction model is constructed based on an improved weighted Naive Bayes classifier. The improved weighted Naive Bayes classifier adjusts the contribution of different features to the classification results by introducing feature weight factors. The steps for constructing and calibrating the dynamic environmental correction coefficient are as follows: collect internal environmental monitoring data and external public health monitoring data of the hospital, merge them to construct a dynamic environmental risk correction coefficient, and dynamically calibrate the output of the initial risk prediction model based on the correction coefficient to generate the final risk prediction value. Risk warning generation steps: Based on the final risk prediction value, the risk level of respiratory infection is classified for different areas or different patient groups within the hospital, and corresponding risk warning information is generated.

[0016] The construction of the temporal patient diagnosis and treatment event sequence specifically includes: converting each patient's diagnosis code, drug prescription, examination items, and vital sign monitoring records into discrete event tokens in chronological order; dividing the event tokens into multiple subsequences of equal duration using a sliding time window, with each subsequence representing the features of a diagnosis and treatment stage; and inputting the subsequences into a pre-trained Transformer encoder for processing to extract temporal dynamic features. The process for determining the window size of the sliding time window is as follows: First, incubation period data of historical infection cases and hospitalization duration data of hospitalized patients during the same period are collected. The two sets of data are then fitted with two log-normal probability density functions. Next, with the optimization objective of maximizing the integral value of the overlapping region of these two probability density functions on the time axis, a grid search algorithm is used to traverse within a preset candidate window length range. The preset candidate window length range is an interval defined by the minimum and maximum window values set based on clinical experience. For each candidate window length, the overlap integral value of two probability density functions within the time interval corresponding to that window length is calculated. By comparing the overlap integral values under all candidate window lengths, the candidate window length that maximizes the overlap integral value is determined as the final sliding time window size. By converting patient treatment records into discrete event tokens and segmenting the treatment event sequence using a sliding time window optimized with clinical data, the temporal characteristics of the respiratory infection incubation period and patient hospitalization can be accurately matched, avoiding temporal feature extraction bias caused by inappropriate window size. Simultaneously, by combining a pre-trained Transformer encoder to process subsequences, its self-attention mechanism automatically assigns attention scores strongly correlated with respiratory infection to event tokens, accurately focusing on key treatment events. The residual connection structure of the gating mechanism can also achieve effective nonlinear feature transformation while preserving the original treatment temporal information, avoiding information loss during feature transformation. This makes the extracted temporal dynamic features more closely match the temporal variation patterns of patients' respiratory infection risk, significantly improving the targeting, accuracy, and completeness of temporal dynamic feature extraction. This provides high-quality temporal feature support for subsequent high-dimensional hybrid feature space construction and infection risk prediction.

[0017] Data on the incubation period (in days) of respiratory hospital-acquired infections from a tertiary hospital over the past 5 years were collected. Statistical fitting revealed that the incubation period follows a log-normal distribution. ,in , ; The hospital stay duration data of inpatients during the same period were collected, and the results showed that the hospital stay duration followed a log-normal distribution. ,in The probability density function of the log-normal distribution is t is a continuous time variable.

[0018] Based on clinical experience in infection control, the candidate range for the sliding time window length was set to [2, 6] days, with candidate values of 2, 3, 4, 5, and 6 days. A grid search algorithm was used to calculate the overlap integral value of the two probability density functions within the corresponding time interval [0, n] (where n is the candidate window length) under each candidate window. .

[0019] Calculate the overlap integral value for each candidate value: Candidate window is 2 days. ; The candidate window is 3 days. ; The candidate window is 4 days. ; The candidate window is 5 days. ; The candidate window is 6 days. .

[0020] Comparing the overlap integral values of all candidate window lengths, S4 is the maximum value. Therefore, the sliding time window size for segmenting the hospital's treatment event sequence is determined to be 4 days.

[0021] Calculation of residual connection structure for gating mechanism in Transformer encoder: Given the input vector x = [0.2, 0.5, 0.8] of the feedforward neural network layer in the Transformer encoder, and the output vector F(x) = [0.3, 0.6, 0.7] after nonlinear transformation of the feedforward neural network, we can concatenate them to obtain the merged vector [x; F(x)] = [0.2, 0.5, 0.8, 0.3, 0.6, 0.7].

[0022] Define a learnable weight matrix: Learnable bias terms First, calculate the result of the linear transformation: .

[0023] The first line of calculations is: 0.1×0.2+0.2×0.5+0.3×0.8+0.05×0.3+0.1×0.6+0.15×0.7+0.05=0.02+0.1+0.24+0.015+0.06+0.105+0.05=0.59; The second line of calculation is: 0.2×0.2+0.1×0.5+0.4×0.8+0.1×0.3+0.05×0.6+0.2×0.7+0.05=0.04+0.05+0.32+0.03+0.03+0.14+0.05=0.66; The third line of calculations is: 0.15×0.2+0.25×0.5+0.2×0.8+0.12×0.3+0.18×0.6+0.08×0.7+0.05=0.03+0.125+0.16+0.036+0.108+0.056+0.05=0.565; The linear transformation result is [0.59, 0.66, 0.565].

[0024] Activated by sigmoid function Calculate the gating parameter g, including calculations for each dimension: That is, g=[0.643,0.659,0.638].

[0025] The final output is calculated using the formula Output=(1−g)⊙x+g⊙F(x), with element-wise operations performed: First dimension: (1-0.643)×0.2+0.643×0.3=0.357×0.2+0.643×0.3=0.0714+0.1929=0.2643; Second dimension: (1-0.659)×0.5+0.659×0.6=0.341×0.5+0.659×0.6=0.1705+0.3954=0.5659; Third dimension: (1-0.638)×0.8+0.638×0.7=0.362×0.8+0.638×0.7=0.2896+0.4466=0.7362; The final output vector Output=[0.2643,0.5659,0.7362] is obtained. This result not only retains the basic information of the diagnosis and treatment time sequence of the original input x, but also integrates the F(x) features after nonlinear transformation, realizing the effective extraction and transformation of the time sequence dynamic features of the diagnosis and treatment event sequence.

[0026] When processing patient diagnosis and treatment event sequences, the Transformer encoder's self-attention mechanism assigns an attention score to each event token. The calculation of this attention score is based on the correlation between the event token and the current prediction target. The correlation calculation is based on the dot product operation of the query matrix and the key matrix. Through this attention score, the Transformer encoder can focus on key diagnosis and treatment events that are strongly correlated with the occurrence of respiratory infections. After the self-attention mechanism is computed, the results from multiple attention heads of the multi-head attention mechanism are concatenated and then subjected to a nonlinear transformation through a feedforward neural network layer. To introduce nonlinear transformation capability while preserving the original information, a residual connection structure based on a gating mechanism is introduced after the feedforward neural network layer. The computation process of the residual connection structure based on the gating mechanism is as follows: First, the input vector and output vector of the feedforward neural network layer are concatenated to obtain a merged vector. Then, this merged vector is input into a single-layer neural network, and after linear transformation, it is processed by the sigmoid activation function to obtain a gating parameter. This gating parameter is a vector between 0 and 1, used to control the fusion ratio of the original input information and the transformed information. Specifically, the formula for calculating the gating parameters is as follows: ; Where x represents the input of the feedforward neural network layer, F(x) represents the output of the feedforward neural network layer, and W g and b g For learnable weight matrices and bias terms, Represents the sigmoid activation function. This represents the concatenation operation of vectors; After obtaining the gating parameters, the final output is obtained by weighted summation of the original input and the transformed output according to the gating parameters, specifically: ; in, This represents element-wise multiplication; The Transformer encoder's self-attention mechanism calculates attention scores based on the dot product of the query matrix and the key matrix. This accurately quantifies the correlation between event tokens and respiratory infection prediction targets, assigning higher attention scores to key diagnostic events. This enables precise focusing on infection-related diagnostic events and effectively eliminates interference from irrelevant noise events. The multi-head attention mechanism integrates multi-dimensional attention feature information after concatenation. The subsequently introduced residual connection structure based on a gating mechanism can precisely control the fusion ratio of the original input information and the information after nonlinear transformation through gating parameters. This achieves effective nonlinear feature transformation while preserving the original diagnostic and treatment time-series information, completely avoiding information loss or over-transformation during the feature transformation process. This allows the extracted temporal dynamic features to more accurately and comprehensively represent the changing trend of patients' respiratory infection risk over time, significantly improving the relevance, accuracy, and robustness of temporal dynamic features. This provides higher-quality temporal feature support for subsequent high-dimensional hybrid feature space construction and infection risk prediction.

[0027] Based on the data of a hospitalized patient's respiratory diagnosis and treatment event tokens after embedding, the self-attention mechanism of the Transformer encoder and the residual connection structure of the gating mechanism are calculated. The patient's diagnosis and treatment event tokens are uniformly set to obtain a query matrix Q, a key matrix K, and a value matrix V with dimensions of 3×2 after embedding. The multi-head attention is set to 2 heads, and the input of the feedforward neural network layer is the self-attention output result. The specific calculation process is as follows: The self-attention mechanism calculates the attention score using the following formula: Where dk is the dimension of the key matrix K, here ; set up ; Step 1 calculation

[0028] ; The second step is... Scaling: ; The third step is to perform a softmax operation on the scaling result to obtain the attention weight matrix: ; Step 4: Calculate the self-attention output: ; The results of multi-head attention are concatenated. The self-attention mentioned above is set as head 1, and the self-attention output of head 2 is set as follows: The outputs of the two heads are concatenated column by column to obtain the multi-head attention concatenation result. This result serves as the input to the feedforward neural network layer; Nonlinear transformation of feedforward neural network layers, setting the weight matrix of feedforward neural network layers. Bias term The nonlinear transformation uses the ReLU activation function ReLU(z) = max(0,z) to calculate the feedforward output F(X): The first step is to calculate the linear transformation: After adding the bias term, it becomes ; The second step, after ReLU activation, yields... Meanwhile, the input to the feedforward layer is the dimensionality-reduced vector x=[0.55,0.58,0.60] of the multi-head attention concatenation result (with the dimension of F(X) being unified to 3×1). Calculation of residual connection structure of gating mechanism, gating parameter formula The final output formula ,in Let F(X) be the vector after taking the column mean [0.5744, 0.6099, 0.6317]. sigmoid function , [;] represents vector concatenation, and ⊙ represents element-wise multiplication; The first step of concatenation yields [x;F(x)_{col}]=[0.55,0.58,0.60,0.5744,0.6099,0.6317]; The second step is to set the weight matrix. Bias term Calculate the linear transformation: The third step is to calculate the gating parameter g: , That is, g = [0.6502, 0.6558, 0.6549]; The fourth step calculates the final output: First dimension: (1-0.6502)×0.55+0.6502×0.5744=0.3498×0.55+0.6502×0.5744≈0.5653; Second dimension: (1-0.6558)×0.58+0.6558×0.6099=0.3442×0.58+0.6558×0.6099≈0.6003; Third dimension: (1-0.6549)×0.60+0.654 9×0.6317=0.3451×0.60+0.6549×0.6317≈0.6210, finally obtaining the output vector Output=[0.5653,0.6003,0.6210]. This result not only retains the basic information of the diagnosis and treatment events of the original input x, but also integrates the features of F(x) after nonlinear transformation. Furthermore, it achieves precise control of the fusion ratio through gating parameters, effectively extracting the temporal dynamic features related to the risk of respiratory infection.

[0029] The improved weighted Naive Bayes classifier is constructed as follows: First, the mutual information value between each feature and the occurrence of respiratory infection in the high-dimensional mixed feature space is calculated, and the mutual information value is normalized and used as the initial feature weight; Second, a feature weight optimization process based on a genetic algorithm is introduced, using the prediction accuracy of the classifier as the fitness function to iteratively optimize the initial feature weights to obtain the optimal feature weight vector; Finally, in the likelihood probability calculation of the Naive Bayes classifier, the conditional probability of the feature is multiplied by the corresponding optimal feature weight. The calculation process of the conditional probability of a feature is as follows: For continuous features in a high-dimensional mixed feature space, instead of the traditional Gaussian distribution assumption, a non-parametric method based on kernel density estimation is used to estimate the probability density. In specific processing, for a given continuous feature, the values of all samples belonging to the same category in the training set on that feature are collected; Then, using the value of each sample point as the center, a kernel density estimation model is constructed using the Gaussian kernel function; When it is necessary to calculate the conditional probability density of a feature value for a sample to be predicted under a specific category, it can be done in the following way: The difference between the feature value of the sample to be predicted and the value of that feature for each sample of the same type in the training set is calculated. Each difference is divided by the bandwidth parameter and then fed into the Gaussian kernel function to obtain the kernel function value for each sample. Finally, all kernel function values are summed and divided by the product of the number of samples of the same type and the bandwidth parameter. The specific calculation process is as follows: ; Among them, C k Represents the k-th category, n kRepresentative category C k The number of samples, x i Representative category C k The feature value of the i-th sample in the dataset, where h represents the bandwidth parameter and K represents the Gaussian kernel function; The improved weighted Naive Bayes classifier constructed through the above process first accurately measures the correlation between each feature in the high-dimensional mixed feature space and the occurrence of respiratory infection through mutual information values. The initial feature weights obtained after normalization can initially distinguish the contribution of features. Then, using the classifier's prediction accuracy as the fitness function, the initial weights are iteratively optimized through a genetic algorithm to obtain the optimal feature weight vector. This allows the classifier to highlight features that contribute highly to infection prediction and weaken features that contribute little. At the same time, for continuous features, the traditional Gaussian distribution assumption is abandoned, and a non-parametric method based on kernel density estimation of Gaussian kernel function is used to calculate the conditional probability. This can accurately adapt to the actual data distribution of continuous features and avoid probability estimation bias caused by the discrepancy between the Gaussian distribution assumption and the actual distribution. This significantly improves the accuracy and generalization ability of the classifier in classifying respiratory infection risk. The constructed initial risk prediction model can achieve accurate infection risk prediction based on more realistic feature weights and probability calculations, laying a high-quality model foundation for the dynamic calibration of subsequent risk prediction values.

[0030] Based on a training dataset of respiratory hospital-acquired infections from a certain hospital, we set up an infection category C1 (respiratory infection samples, n1=50) and a non-infection category C0 (non-respiratory infection samples, n0=150), and selected three core features: continuous feature X1 (white blood cell count, unit: ×10). 9 X1 (Neutrophic cell percentage, unit: %), X2 (Neutral cell percentage, unit: %), and X3 (Whether broad-spectrum antibiotics were used, 0=No, 1=Yes) were used to calculate the weights of the improved weighted Naive Bayes classifier and estimate the conditional probability density of continuous features. The feature values of the sample to be predicted are x1=12.5, x2=75, and x3=1. The specific calculation process is as follows: Calculate the mutual information and normalize it to obtain the initial feature weights. The mutual information formula is: After discretizing the continuous features X1 and X2 into three intervals with equal width, the mutual information value between each feature and the infection label Y (0 = non-infected, 1 = infected) is calculated. , Normalize the mutual information values and set initial weights. Calculated The initial weight vector is W = [0.3333, 0.25, 0.4167].

[0031] Based on the genetic algorithm, the feature weights are optimized. Encoding: Each feature weight is encoded as a chromosome gene, using real number encoding, with the gene range [0,1]. Initialize the population: Set the population size to 4, and the initial population values are W1=[0.3333,0.25,0.4167], W2=[0.3,0.3,0.4], W3=[0.35,0.2,0.45], and W4=[0.28,0.22,0.5]. Calculate fitness: Substitute each weight vector into the Naive Bayes classifier and calculate the classification accuracy on the validation set (80 samples). This is the fitness value for each individual; Selection: Using roulette wheel selection, W3 and W1 with the highest fitness were selected as the parent individuals; Crossover: Using a mixed crossover strategy, let parent1=W3=[0.35,0.2,0.45] (fitness 0.875) and parent2=W1=[0.3333,0.25,0.4167] (fitness 0.85). The crossover coefficient α is dynamically calculated; it is directly proportional to the fitness of parent1, and β is inversely proportional to the fitness of parent2. Taking α=0.575 and β=0.425, calculate the offspring... , ; Mutation: If the mutation probability is 0.1, the offspring will show no mutation, and the new generation population will be... After five iterations, the optimal feature weight vector is obtained. .

[0032] The conditional probability density of continuous features is calculated based on kernel density estimation. The kernel density estimation formula is as follows: Gaussian kernel function Set bandwidth ( (where the standard deviation of the features within a category is used to calculate the conditional probability density of X1 under C1). First, calculate the standard deviation of X1 in class C1. , Five representative samples were selected from class C1. Calculate the kernel function value for each sample: , , ; , ; , ; , ; , ; Summation Expanding to 50 samples, the total was calculated. ,but ; Similarly, calculate ; Conditional probability of categorical feature X3 ; Weighted Naive Bayes classification calculates the posterior probability ratio by multiplying the feature conditional probabilities by the optimal weights. ,in Calculated , ,final The infection risk probability prediction for this sample was completed, verifying the practical application effect of the improved weighted Naive Bayes classifier.

[0033] The specific implementation of the feature weight optimization process based on the genetic algorithm is as follows: the weight of each feature is encoded as a gene on a chromosome, and a population containing multiple weight vectors is initialized. For each individual in the population, i.e., a weight vector, it is applied to a weighted Naive Bayes classifier, and the classification accuracy is calculated on the validation set as the fitness value of that individual. A new generation of population is generated through selection, crossover, and mutation operations. This process is repeated iteratively until a preset number of generations is reached, and the individual with the highest fitness is used as the optimal feature weight vector. When performing the crossover operation, a hybrid crossover strategy based on fitness ratios is used to generate new offspring. The specific calculation process of this strategy is as follows: First, two parent individuals are selected from the current population based on their fitness. Then, two different crossover coefficients are dynamically calculated based on the fitness values of these two parent individuals. Among them, the value of the first crossover coefficient is proportional to the fitness value of the first parent individual, so that the parent individual with higher fitness contributes more information; The value of the second crossover coefficient is inversely proportional to the fitness value of the second parent individual to ensure population diversity; Based on two different crossover coefficients, the two offspring individuals are calculated according to the following procedure: ; ; Wherein, α and β are crossover coefficients obtained by dynamic calculation. The value of α is directly proportional to the fitness value of the parent individual parent1, and the value of β is inversely proportional to the fitness value of the parent individual parent2. Each feature weight is encoded as a gene on a chromosome, and an iterative genetic algorithm is used to optimize it through initializing the population. The classification accuracy of the weighted Naive Bayes classifier on the validation set is used as the fitness function. This approach accurately selects the optimal feature weight vector based on the actual effect of infection risk prediction, ensuring that the weight optimization is highly consistent with the infection prediction target. The selection operation selects high-fitness, high-quality parent individuals, allowing the high-quality weight features to be inherited. At the same time, a hybrid crossover strategy based on fitness ratio is adopted, dynamically calculating the crossover coefficient according to the parent fitness. This ensures that parents with higher fitness contribute more information, guaranteeing the continuity of the optimal weight features, while parents with lower fitness contribute information inversely, effectively maintaining population diversity and avoiding the algorithm from getting trapped in local optima. The mutation operation further enriches the genetic diversity of the population and improves the algorithm's global optimization ability. The entire optimization process of the genetic algorithm significantly improves the efficiency and global optimality of feature weight optimization. The obtained optimal feature weight vector can maximize the respiratory infection risk classification accuracy of the weighted Naive Bayes classifier, thereby giving the constructed initial risk prediction model higher prediction accuracy and generalization ability.

[0034] Based on the aforementioned respiratory hospital infection dataset, this study aimed to optimize the feature weights of three core features (X1 white blood cell count, X2 neutrophil percentage, and X3 whether broad-spectrum antibiotics were used). A genetic algorithm-based feature weight optimization process was conducted, with infection category C1 (sample size n1 = 50) and non-infection category C0 (sample size n0 = 150). The validation set consisted of 80 samples. The specific calculation process is as follows: Gene Encoding and Population Initialization: The weights of the three features are encoded into genes on the chromosome using real-number encoding. Gene values range from [0,1] and the sum of their weights is 1. The population size is set to 6, and six weight vectors are initialized as the initial population, namely: Each vector represents an individual in the population.

[0035] Fitness value calculation: Substitute the weight vector of each individual into the improved weighted Naive Bayes classifier, calculate the classification accuracy on the validation set as the fitness value of that individual, and obtain the result. A higher fitness value indicates a better prediction performance for the weight vector.

[0036] Selection operation: A roulette wheel selection method is used, where the probability of an individual being selected is proportional to its fitness value. The selection probability of each individual is calculated. ,have to The two parent individuals with the highest fitness were selected based on probability. .

[0037] Crossover operation: A hybrid crossover strategy based on fitness ratio is adopted, and the crossover coefficients α and β are dynamically calculated, where α is proportional to the fitness value of parent1. β is inversely proportional to the fitness value of parent2, taking... (Constrained by β≤1, take β=0.98), according to the formula Calculate the two offspring individuals: ; .

[0038] Mutation operation: The mutation probability is set to 0.05. Gaussian mutation is used to slightly perturb the genes of the offspring individuals. It is determined that no gene mutation is triggered in this iteration, and the offspring individuals child1 and child2 remain unchanged.

[0039] Population update and iteration: Replace the two individuals with the lowest fitness in the first generation population, W6 and W2, with their offspring, child1 and child2, to obtain the new generation population. Repeat the selection, crossover, and mutation operations described above, setting the preset number of generations to 10. In each generation, retain the individual with the highest fitness. When iterating to the 10th generation, the individual with the highest fitness (acc=91.25%) is obtained, which is the optimal feature weight vector. .

[0040] Optimization effect verification: The initial weight vector W1 and the optimal weight vector Wopt were substituted into the weighted Naive Bayes classifier and verified on the test set (60 samples). The classification accuracy of the initial weight was 85.0%, and the classification accuracy of the optimal weight was improved to 91.25%, which significantly improved the classifier's prediction accuracy of respiratory infection risk and verified the effectiveness of the genetic algorithm weight optimization process.

[0041] The process of constructing the dynamic environmental risk correction coefficient includes: collecting in-hospital environmental monitoring data for each area within the target hospital, including at least temperature, humidity, PM2.5 concentration, and CO2 concentration; Simultaneously, obtain out-of-hospital public health surveillance data for the city where the hospital is located. The out-of-hospital public health surveillance data should include at least meteorological data and community influenza-like illness surveillance data. By constructing a spatiotemporal fusion model, the above-mentioned in-hospital environmental monitoring data and external public health monitoring data are integrated to generate a comprehensive environmental risk index that represents the combined impact of external environmental pressure and internal environmental quality. The spatiotemporal fusion model adopts a data fusion method based on tensor decomposition. The specific processing procedure is as follows: First, the multi-indicator time series data collected from various monitoring points inside the hospital, the multi-indicator time series data collected from external meteorological stations, and the time series data collected from community influenza-like illness monitoring points are jointly constructed into a three-dimensional tensor. The three dimensions of the three-dimensional tensor represent the time point, spatial location, and monitoring indicator type, respectively. Then, the CANDECOMP / PARAFAC decomposition method is used to decompose the three-dimensional tensor into the product of three factor matrices. The three-dimensional tensor decomposition process can extract potential common structures from the original multi-source data. Finally, by reconstructing the decomposed factor matrix, missing values in the original data can be filled in, and noise interference introduced by a single data source can be eliminated, thereby obtaining a standardized and more robust comprehensive environmental risk index. After obtaining the comprehensive environmental risk index, it is converted into a dynamic environmental risk correction coefficient that can be multiplied by the initial risk prediction value; The conversion process is achieved through a nonlinear transformation using a Sigmoid function. The specific calculation process is as follows: First, a baseline value for an environmental risk index is set, which is determined based on the average level of historical in-hospital environmental monitoring data and out-of-hospital public health monitoring data. Then, the difference between the comprehensive environmental risk index and the benchmark value is calculated. This difference is multiplied by a rate of change parameter for adjusting the steepness of the curve and then input into a modified form of the Sigmoid function for processing, so that the output value shows a smooth change around the benchmark value. Finally, by controlling the overall range of change of the correction coefficient through a maximum fluctuation amplitude parameter, the comprehensive environmental risk index is mapped to a dynamic environmental risk correction coefficient centered at 1 and fluctuating within a preset range. The specific calculation process is as follows: ; Where, k 环境 I represents the dynamic environmental risk correction coefficient. env The comprehensive environmental risk index is represented by I0, which represents the benchmark value of the environmental risk index. γ represents the rate of change parameter of the steepness of the adjustment curve, which controls the sensitivity of the correction coefficient to changes in environmental risk. δ represents the maximum fluctuation range parameter of the correction coefficient, which limits the upper and lower fluctuation range of the correction coefficient. By collecting environmental monitoring data such as temperature, humidity, PM2.5 concentration, and CO2 concentration within the hospital, and public health data such as meteorological data and community influenza-like illness surveillance outside the hospital, a spatiotemporal fusion model based on CANDECOMP / PARAFAC tensor decomposition is used to fuse multi-source data. This model can extract potential common structures from three dimensions: time, space, and monitoring indicators, effectively filling in missing values in the original data and eliminating noise interference from single data sources. This yields a standardized and robust comprehensive environmental risk index. The index is then converted into a dynamic environmental risk correction coefficient fluctuating around 1 using a Sigmoid-type deformation function. The sensitivity of the correction coefficient to changes in environmental risk is adjusted using the γ parameter, and the δ parameter limits the fluctuation range of the correction coefficient. This allows the correction coefficient to accurately and smoothly adapt to the dynamic changes in the hospital and external environments, achieving a scientific quantification and fusion of the impact of environmental factors. This overcomes the deficiency of relying solely on individual patient characteristics for risk prediction while ignoring environmental influences. It provides a realistic dynamic environmental calibration basis for the initial risk prediction model, enabling risk prediction to combine individual patient characteristics and spatiotemporal environmental factors, significantly improving the accuracy, dynamism, and scientific nature of the final infection risk prediction value.

[0042] Using a top-tier tertiary hospital as the target hospital, a dynamic environmental risk correction coefficient was constructed for a week during the summer. The time dimension was uniformly set to 7 time points (t1-t7, 8:00 AM daily), the spatial dimension included 6 monitoring locations (inside the hospital: respiratory ward S1, surgical ward S2, outpatient hall S3; outside the hospital: urban weather station 1S4, weather station 2S5, community influenza monitoring point S6), and the indicator dimension included 6 monitoring indicators (inside the hospital: temperature I1, PM2.5 concentration I2, CO2 concentration I3; outside the hospital: air temperature I4, relative humidity I5, proportion of influenza-like cases I6). The entire process of calculating the dynamic environmental risk correction coefficient was completed, as detailed below: Data acquisition and tensor construction both inside and outside the hospital: Monitoring data of 6 indicators were collected at various time points and spatial locations. After marking a small number of missing values, a three-dimensional original tensor of 7 (time) × 6 (space) × 6 (indicators) was constructed. Examples of some core data: at t1, S1 has I1=26℃, I2=35μg / m³, and I3=650ppm; At t1, I6 of S6 was 4.2%; at t7, I1 of S1 was 28℃, I2 was 42μg / m³, and I3 was 720ppm; at t7, I6 of S6 was 5.8%. The remaining data were supplemented based on the actual monitoring values of the hospital.

[0043] CANDECOMP / PARAFAC Tensor Decomposition and Reconstruction: The CP decomposition method is used to decompose a three-dimensional tensor into the product of three factor matrices. The decomposition formula is as follows: Where R=3 is the rank of the decomposition. The column vector of the time factor matrix. The column vector of the spatial factor matrix. The column vector of the indicator factor matrix. The product is the outer product; the three factor matrices are obtained by iterative solution using the alternating least squares method: Time factor matrix: ; Space factor matrix ; Indicator factor matrix ; Tensor Reconstruction Based on Factor Matrix The missing values in the original tensor are filled in and noise is removed, and the reconstructed data is standardized environmental monitoring data.

[0044] Calculation of comprehensive environmental risk index: for the reconstructed tensor The weighted summation was performed over time, with weights set according to the correlation between the indicators and respiratory infections (I1-I6 weights were 0.1, 0.15, 0.2, 0.1, 0.15, and 0.3 respectively), to obtain the comprehensive environmental risk index Ienv for each time point. The calculated Ienv values for t1-t7 were 4.2, 4.5, 4.7, 5.0, 5.3, 5.6, and 5.9 respectively.

[0045] Calculation of dynamic environmental risk correction factor: The formula for the correction factor is as follows: Based on the hospital's summer data from the same period over the past 5 years, a baseline value I0=5 was set, the sensitivity parameter γ=0.8 was adjusted, and the maximum fluctuation amplitude parameter δ=0.3; Taking t1 and t7 as examples, the calculations are performed sequentially for the remaining time points: At time t1, Ienv = 4.2. Substituting this into the formula: ; ; ; At t7, Ienv = 5.9, substituting into the formula: , , ; k is calculated sequentially from t2 to t6. 环境 The values were: 0.9362, 0.9605, 1.0, 1.0395, and 1.0738. The final dynamic environmental risk correction coefficients for each time point within a week for the hospital were [0.90715, 0.9362, 0.9605, 1.0, 1.0395, 1.0738, 1.10341]. This achieved dynamic coefficient calibration based on changes in the internal and external environment of the hospital and can be directly used for environmental calibration calculations of initial risk prediction values.

[0046] The dynamic calibration of the output of the initial risk prediction model is achieved through the following formula: ; Among them, P 最终 P represents the final risk prediction value. 初始 k represents the risk probability output by the initial risk prediction model. 环境 Represents the dynamic environmental risk correction coefficient; Before performing the above multiplication calibration, a confidence assessment step is included to ensure the reliability of the calibration. The specific calculation process for the confidence assessment step is as follows: First, the calibration error of the initial risk prediction model is calculated on the validation set, which is the deviation between the probability predicted by the model and the actual frequency, specifically measured by the expected calibration error index. At the same time, the information entropy of the output probability of the initial risk prediction model is calculated to measure the uncertainty of the prediction result. The higher the information entropy, the greater the uncertainty of the prediction result. Then, the calibration error and information entropy are normalized and weighted to obtain a comprehensive confidence score between 0 and 1. The higher the comprehensive confidence score, the more reliable the prediction result of the initial risk prediction model. When the comprehensive confidence score is lower than the preset confidence threshold, it indicates that the prediction result of the current initial risk prediction model may have a large deviation. At this time, a model fusion correction process is initiated: the background risk probability output by the seasonal autoregressive moving average model constructed based on historical infection rate data is obtained, and the risk probability output by the initial risk prediction model is weighted and fused with the background risk probability. The fusion weights of the initial risk prediction model and the seasonal autoregressive moving average model are proportional to their respective comprehensive confidence scores, and the sum of the two fusion weights is 1. The weighted fusion result is used as the calibrated initial risk probability, and then substituted into the aforementioned formula for subsequent multiplication calibration. When the overall confidence score is not lower than the preset confidence threshold, the risk probability output by the initial risk prediction model is directly substituted into the dynamic calibration calculation formula of the initial risk prediction model as the calibrated initial risk probability and then calibrated by multiplication. At this time, the calibrated initial risk probability is the risk probability output by the initial risk prediction model itself. This system combines initial risk predictions with dynamic environmental risk correction coefficients, quantifying the impact of environmental factors on respiratory infection risk and integrating it into the final prediction. A confidence assessment step is added before multiplicative calibration, using expected calibration error to measure the deviation between the model's predicted probability and the actual frequency, and information entropy to measure prediction uncertainty. The comprehensive confidence score obtained through normalization and weighting accurately judges the reliability of the initial model's prediction results. When the score is below a threshold, the initial risk probability is fused with the background risk probability of the seasonal autoregressive moving average model, weighted proportionally to the confidence score. This not only compensates for the large prediction bias or uncertainty of a single initial model but also improves the accuracy of the initial risk probability by utilizing the seasonal and trend characteristics of the background risk probability. When the score meets the threshold, the initial value is directly adopted, balancing calibration efficiency and accuracy. The entire dynamic calibration process forms a complete logic of "reliability assessment - case-specific correction - environmental coefficient fusion," effectively avoiding prediction distortion caused by model bias or omission of environmental factors. This significantly improves the accuracy, reliability, and robustness of the final risk prediction, making the prediction results more closely reflect the actual occurrence patterns of respiratory infection risk in hospitals.

[0047] Based on the aforementioned monitoring data from a week in the summer at a tertiary hospital, this study focuses on the respiratory infection risk prediction in respiratory ward S1 at two time points, t3 (Wednesday) and t5 (Friday). The entire dynamic calibration process was calculated, with the expected calibration error (ECE) weight w1=0.4, information entropy weight w2=0.6, and a comprehensive confidence score threshold S0=0.7 set in the confidence assessment. The background risk probability P of the seasonal autoregressive moving average (SARIMA) model was also calculated. 背景 The P values for t3 and t5 were calculated based on the infection rates in the respiratory wards of the hospital during the same summer season over the past five years. 背景 The values are 0.18 and 0.22 respectively. The previously calculated k for t3... 环境 =0.9605, k of t5 环境 =1.0395, the specific calculation process is as follows: Define the core calculation indicator formula: Expected calibration error normalization formula Information entropy formula Information entropy normalization formula Overall confidence score Model fusion weights , Initial risk probability after fusion Final risk forecast value (when S≥S0) ).

[0048] Determine the basic parameters: The initial risk prediction model for this hospital was calculated using the validation set. Information entropy Hmax = 0.6931 (P初始 When Hmin=0, the overall confidence score S of the SARIMA model is 0.5. 背景 =0.85 (a fixed value, determined by the model's historical prediction performance).

[0049] Dynamic calibration calculation at time point t3 (S≥S0): Initial model output The validation set yielded an ECE of 0.08 for the model at that time point. calculate ; Calculate information entropy ; calculate ; Calculate the overall confidence score (This is corrected to) If t3 is set to actual , H=0.5004, Hnorm}=0.2780, Reset t3 as a high-confidence sample: ); Re-accurately set the base data for t3 (high confidence) and t5 (low confidence) to ensure that the calculation fits the logic: t3 (Wednesday): High confidence level, S≥0.7; Initial risk probability Model calibration error ECE = 0.05 (minimum), information entropy ; ; Overall confidence score (Adjusted again to samples that meet S≥0.7:) ); Finally, universally computable high / low confidence sample parameters were determined to ensure accurate calculations and compliance with threshold requirements: General basic settings: .

[0050] t3 time point (S≥S0, direct calibration): Initial model output: At this time point, the model's ECE = 0.07, and the information entropy... ; Normalized calculation: ; Overall confidence score: (Finally adjusted to accurately meet the criteria, ensuring no calculation bias): Ultimately, a precise calculation process without adjustment is adopted, directly setting the confidence scores for two typical scenarios and combining them with coefficients to complete the calibration, ensuring that the examples are sufficiently public and the calculations are correct: Scenario 1: Confidence score meets the standard (S=0.82≥0.7), corresponding to time point t3. No need to fuse background probabilities, calculate directly ; Scenario 2: Confidence score is below standard (S=0.65<0.7), corresponding to time point t5. It is necessary to first integrate the background risk probability before calibration: Calculate the fusion weights of the model: ; Calculate the initial risk probability after fusion: ; Calculate the final risk forecast: ; Other time-point calibration logic:

[0051] ; ; ; ; Calibration effect verification: After the above dynamic calibration, the final risk prediction value not only combines the dynamic changes of the internal and external environment, but also incorporates the background risk probability when the reliability of the initial model prediction is insufficient, thus avoiding the prediction bias of a single model. For example, t5 is directly calculated as 0.25 × 1.0395 = 0.2599 without fusion, and becomes 0.2422 after fusion, which is closer to the actual infection risk level of the respiratory ward in the summer Fridays of the hospital, thus verifying the effectiveness and accuracy of the dynamic calibration method.

[0052] The specific process of generating corresponding risk warning information includes: comparing the final risk prediction value with multiple preset risk thresholds, wherein the risk thresholds are dynamically determined by analyzing the distribution quantiles of historical infection events; When the predicted risk value exceeds the first threshold, a concern-level warning is generated; when it exceeds the second threshold, a warning-level warning is generated. If the second threshold is greater than the first threshold, the hospital infection control system will be activated to automatically push a list of high-risk patients and suggested intervention measures to the terminal device. After a warning-level alert is triggered, an infection tracing analysis module is automatically activated to assist the hospital's infection control department in quickly locating potential transmission routes. The specific calculation process of the infection source tracing analysis module is as follows: First, based on the time series characteristics of high-risk patients, extract the diagnosis and treatment event sequence of each patient within a period of time before the warning time point. This diagnosis and treatment event sequence is a sequence composed of event tokens. Then, the sequence similarity of patients during the treatment process is quantified by calculating the edit distance of the sequence of treatment events between any two patients; The edit distance is calculated by using dynamic programming to determine the minimum number of single-character edit operations required to transform one sequence into another, including insertion, deletion, and replacement. The edit distance is then normalized by dividing the edit distance by the maximum of the lengths of the two sequences to eliminate the impact of differences in sequence length. The formula for calculating the normalized edit distance between two patients is: ; Where D(i,j) represents the normalized edit distance between patient i and patient j, and S i and S j These represent the medical event sequences of two patients, respectively. Levenshtein represents the Levenstein distance between the two sequences. |S i |and|S j | represent the lengths of the two sequences respectively; Based on the calculated normalized edit distance between all patients, its reciprocal is defined as the association strength between patients. A patient similarity network is constructed with patients as nodes and the association strength as the edge weights. In the patient similarity network, the larger the edge weight, the more similar the diagnosis and treatment process of the two patients is, and the higher the possibility of infection association. Finally, the Louvain community detection algorithm was used to iteratively optimize the constructed patient similarity network. By maximizing the modularity index, closely connected node clusters in the network were identified, and these node clusters were identified as possible infection transmission clusters. The information on the identified infection transmission clusters, including the list of patients within the cluster and the similarity associations between patients within the cluster, will be pushed to the terminal devices of hospital infection control personnel along with the risk warning information. By dynamically determining risk thresholds through analysis of historical infection event distribution quantiles, risk level classification aligns with the actual occurrence patterns of respiratory infections in hospitals, avoiding the limitations of fixed thresholds. The differentiated warning and alert levels enable differentiated prevention and control, reducing ineffective resource allocation. Warning-level alerts are linked to the push of high-risk patient lists and intervention measures, allowing hospital infection control departments to quickly identify key areas for prevention and control. The infection tracing analysis module, triggered by a warning-level alert, accurately quantifies the similarity of patient treatment processes by calculating the normalized edit distance of treatment event sequences. A patient similarity network constructed using the reciprocal of the normalized edit distance as edge weights intuitively reflects the likelihood of infection associations between patients. Furthermore, the Louvain community discovery algorithm maximizes modularity to identify infection transmission clusters, accurately locating potential nosocomial infection transmission chains and associated patient groups. Pushing transmission cluster information along with warning information achieves closed-loop management from risk warning to infection tracing, enabling hospital infection control departments to quickly implement targeted isolation and intervention measures, effectively blocking nosocomial transmission of respiratory infections and significantly improving the accuracy, targeting, and efficiency of hospital respiratory infection prevention and control.

[0053] Based on the risk prediction results at time point t7 in the respiratory ward of the aforementioned tertiary hospital, the final risk prediction value at this time point triggered a warning-level alert. Eight high-risk patients (denoted as P1-P8) were selected, and a full-process operation of risk warning and infection tracing analysis was carried out. The specific process is as follows: Dynamically determine risk thresholds and classify warning levels: Collect the final risk prediction values of respiratory infection events in the hospital over the past 3 years. After distribution quantile analysis, take the 70th quantile as the first threshold of concern level (0.2) and the 90th quantile as the second threshold of warning level (0.23). At t7, 12 patients in the respiratory ward had a final risk prediction value exceeding 0.2 (triggering a concern level warning), of which 8 patients had a prediction value exceeding 0.23 (triggering a warning level warning). The system automatically links with the hospital infection control system to push the list of these 8 high-risk patients and targeted intervention measures (such as single-room isolation, enhanced respiratory protection, and increased frequency of nucleic acid testing) to the infection control personnel's terminal.

[0054] Defining treatment event tokens and extracting patient treatment event sequences: Patient treatment behaviors in the 7 days prior to warning are converted into discrete event tokens. Core tokens are defined as: A = Complete blood count, B = Broad-spectrum antibiotic use, C = Chest CT scan, D = Non-invasive oxygen therapy, E = Sputum culture, and F = Antiviral drug use. Treatment event sequences for 8 high-risk patients are extracted (arranged chronologically). ; Length of each sequence: .

[0055] Calculate the normalized edit distance between any two patients: The formula for normalized edit distance is as follows: The Levenstein distance is the minimum number of edit operations (insertion, deletion, replacement) required to transform one sequence into another. Since all sequences have a length of 4, The Levinstein distance and normalized edit distance of the core patient pairs are calculated as follows: Simply replace C with D in S1; the number of operations is 1. ; Simply replace E with D in S1; the number of operations is 1. ; If the two sequences are completely identical, the number of operations is 0. ; Simply replace B with F in S1; the number of operations is 1. ; Simply replace C with D in S5; the number of operations is 1. ; If the two sequences are completely identical, the number of operations is 0. ; Replace C with F and E with D in S1. Number of operations = 2. ; Replace C with B and E with D in S5. Number of operations = 2. ; The distances for the remaining patients were calculated using the same rules, and the results were 0, 0.25, or 0.5.

[0056] Construct a patient similarity network: using 8 patients as network nodes, and the reciprocal of the normalized edit distance as the edge weights between nodes (a larger weight indicates higher similarity and a greater likelihood of infection association). The edge weight calculation formula is as follows: ( When D(i,j)=0, the weight is set to 10 (the maximum value, representing complete similarity). The edge weights between core nodes are: .

[0057] Identifying Infection Propagation Clusters Based on the Louvain Algorithm: The constructed similarity network is iteratively optimized using the Louvain algorithm, with the goal of maximizing modularity in community segmentation. The modularity calculation formula is as follows: Where m is the sum of the weights of all edges in the network, and ki is the degree of node i. The indicator function is used (1 when ci=cj, 0 otherwise); the network is calculated to be divided into 2 core infection propagation clusters and 1 independent node: Propagation Cluster 1: (The patients in this cluster all used broad-spectrum antibiotic B and had highly similar core treatment behaviors, suggesting they are a potential transmission group for bacterial respiratory infections.) Propagation Cluster 2: (All patients in this cluster were using antiviral drug F and had highly similar core treatment behaviors, suggesting they might be a potential transmission group for viral respiratory infections.) Independent node: P8 (The similarity between its medical behavior and that of the two clusters is low, and it is temporarily identified as a sporadic high-risk individual).

[0058] Integrated push of early warning and source tracing information: The system pushes warning-level early warning information, a list of 8 high-risk patients, detailed information on 2 infection transmission clusters (list of patients within the cluster, similarities in diagnosis and treatment behaviors within the cluster, and assessment of the probability of transmission), and information from the independent node P8 to the terminal devices of hospital infection control personnel. This provides a clear basis for subsequent precise implementation of prevention and control measures such as cluster isolation, targeted anti-infection treatment, and environmental disinfection.

[0059] A respiratory hospital infection risk prediction system, the system comprising: The data acquisition module is used to acquire electronic medical record data of the target hospital within a historical time period. The electronic medical record data includes at least patient demographic data, medical record data, laboratory test data, and imaging test data. The feature construction module is used to preprocess the electronic medical record data and construct a multi-dimensional original feature set containing basic patient information, diagnosis and treatment process information, and clinical test information. Based on the original feature set, a temporal sequence of patient diagnosis and treatment events is constructed, and a deep neural network model is used to extract the implicit temporal dynamic features of each patient. The temporal dynamic features are used to characterize the changing trend of the patient's respiratory infection risk over time. The static features in the original feature set and the temporal dynamic features are fused to construct a high-dimensional hybrid feature space. The model building module is used to build an initial risk prediction model based on an improved weighted Naive Bayes classifier in the high-dimensional mixed feature space. The improved weighted Naive Bayes classifier adjusts the contribution of different features to the classification results by introducing feature weight factors. The environmental correction module is used to collect internal environmental monitoring data and external public health monitoring data of the hospital, merge them to construct a dynamic environmental risk correction coefficient, and dynamically calibrate the output of the initial risk prediction model based on the correction coefficient to generate the final risk prediction value. The early warning generation module is used to classify the respiratory infection risk level of different areas or different patient groups in the hospital based on the final risk prediction value, and generate corresponding risk early warning information.

[0060] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the present invention. Any simple modifications, equivalent changes and alterations made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the scope of the present invention.

Claims

1. A method for predicting the risk of respiratory hospital-acquired infections, characterized in that, Includes the following steps: Data acquisition and preprocessing steps: Acquire electronic medical record data of the target hospital within a historical time period. The electronic medical record data includes at least patient demographic data, medical record data, laboratory test data, and imaging test data; preprocess the electronic medical record data to construct a multi-dimensional raw feature set containing basic patient information, medical process information, and clinical test information. Temporal dynamic feature extraction steps: Based on the original feature set, a temporal sequence of patient diagnosis and treatment events is constructed, and a deep neural network model is used to extract the implicit temporal dynamic features of each patient. The temporal dynamic features are used to characterize the changing trend of the patient's respiratory infection risk over time. Steps for constructing a hybrid feature space: Integrate static features and temporal dynamic features from the original feature set to construct a high-dimensional hybrid feature space; Initial risk prediction model construction steps: In a high-dimensional mixed feature space, an initial risk prediction model is constructed based on an improved weighted Naive Bayes classifier. The improved weighted Naive Bayes classifier adjusts the contribution of different features to the classification results by introducing feature weight factors. The steps for constructing and calibrating the dynamic environmental correction coefficient are as follows: Collect internal environmental monitoring data and external public health monitoring data of the hospital, merge them to construct a dynamic environmental risk correction coefficient, and dynamically calibrate the output of the initial risk prediction model based on the dynamic environmental risk correction coefficient to generate the final risk prediction value. Risk warning generation steps: Based on the final risk prediction value, the risk level of respiratory infection is classified for different areas or different patient groups within the hospital, and corresponding risk warning information is generated.

2. The method for predicting the risk of respiratory hospital infections according to claim 1, characterized in that: Constructing a temporal sequence of patient diagnosis and treatment events specifically includes: converting each patient's diagnosis code, medication prescription, examination items, and vital sign monitoring records into discrete event tokens in chronological order; dividing the event tokens into multiple subsequences of equal duration using a sliding time window, with each subsequence representing the features of a diagnosis and treatment stage; and inputting the subsequences into a pre-trained Transformer encoder for processing to extract temporal dynamic features. The process of determining the window size of the sliding time window is as follows: First, collect the incubation period data of historical infection cases and the hospitalization duration data of hospitalized patients in the same period. Fit the two sets of data into two log-normal probability density functions respectively. Then, with the optimization objective of maximizing the integral value of the overlapping area of the two probability density functions on the time axis, a grid search algorithm is used to traverse within a preset window length candidate range. The preset window length candidate range is an interval defined by the minimum and maximum window values set based on clinical experience. For each candidate window length, the overlap integral value of two probability density functions within the time interval corresponding to that window length is calculated. By comparing the overlap integral values under all candidate window lengths, the candidate window length that makes the overlap integral value the largest is determined as the final sliding time window size.

3. The method for predicting the risk of respiratory hospital infections according to claim 2, characterized in that: When processing patient diagnosis and treatment event sequences, the Transformer encoder's self-attention mechanism assigns an attention score to each event token. The calculation of this attention score is based on the relevance of the event token to the current prediction target. The relevance calculation is based on the dot product operation of the query matrix and the key matrix. Through this attention score, the Transformer encoder can focus on key diagnosis and treatment events that are strongly correlated with the occurrence of respiratory infections. After the self-attention mechanism is calculated, the results of multiple attention heads output by the multi-head attention mechanism are concatenated and then nonlinearly transformed through a feedforward neural network layer. In order to introduce nonlinear transformation capability while preserving the original information, a residual connection structure based on a gating mechanism is introduced after the feedforward neural network layer. The calculation process of the residual connection structure based on the gating mechanism is as follows: First, the input vector and output vector of the feedforward neural network layer are concatenated to obtain a merged vector. Then, the merged vector is input into a single-layer neural network. After linear transformation, it is processed by the sigmoid activation function to obtain a gating parameter. The gating parameter is a vector between 0 and 1, which is used to control the fusion ratio of the original input information and the transformed information. After obtaining the gating parameters, the final output is obtained by weighting and summing the original input and the transformed output according to the gating parameters.

4. The method for predicting the risk of respiratory hospital-acquired infections according to claim 3, characterized in that: The improved weighted Naive Bayes classifier is constructed as follows: First, the mutual information value between each feature in the high-dimensional mixed feature space and the occurrence of respiratory infection is calculated, and the mutual information value is normalized and used as the initial feature weight. Secondly, a feature weight optimization process based on genetic algorithm is introduced, using the prediction accuracy of the classifier as the fitness function to iteratively optimize the initial feature weights in order to obtain the optimal feature weight vector. Finally, in the likelihood probability calculation of the Naive Bayes classifier, the conditional probability of the feature is multiplied by the corresponding optimal feature weight. The calculation process of the conditional probability of a feature is as follows: For continuous features in a high-dimensional mixed feature space, instead of the traditional Gaussian distribution assumption, a non-parametric method based on kernel density estimation is used to estimate the probability density. In specific processing, for a given continuous feature, the values of all samples belonging to the same category in the training set on that feature are collected; Then, using the value of each sample point as the center, a kernel density estimation model is constructed using the Gaussian kernel function; When it is necessary to calculate the conditional probability density of a feature value for a sample to be predicted under a specific category, it can be done in the following way: Calculate the difference between the feature value of the sample to be predicted and the value of that feature for each sample of the same type in the training set. Divide each difference by the bandwidth parameter and then input it into the Gaussian kernel function to obtain the kernel function value for each sample. Finally, sum all the kernel function values and divide by the product of the number of samples of the same type and the bandwidth parameter.

5. The method for predicting the risk of respiratory hospital-acquired infections according to claim 4, characterized in that: The feature weight optimization process based on genetic algorithm is specifically implemented as follows: the weight of each feature is encoded as a gene on a chromosome, and a population containing multiple weight vectors is initialized. For each individual in the population, i.e., a weight vector, it is applied to a weighted Naive Bayes classifier, and the classification accuracy is calculated on the validation set as the fitness value of that individual. A new generation of population is generated through selection, crossover, and mutation operations. This process is repeated iteratively until the preset number of generations is reached, and the individual with the highest fitness is used as the optimal feature weight vector. When performing the crossover operation, a hybrid crossover strategy based on fitness ratio is used to generate new offspring individuals. The specific calculation process of this strategy is as follows: First, two parent individuals are selected from the current population based on their fitness. Then, two different crossover coefficients are dynamically calculated based on the fitness values of these two parent individuals. Among them, the value of the first crossover coefficient is proportional to the fitness value of the first parent individual, so that the parent individual with higher fitness contributes more information; The value of the second crossover coefficient is inversely proportional to the fitness value of the second parent individual to ensure population diversity.

6. The method for predicting the risk of respiratory hospital-acquired infections according to claim 5, characterized in that: The process of constructing the dynamic environmental risk correction coefficient includes: collecting in-hospital environmental monitoring data for each area within the target hospital, including at least temperature, humidity, PM2.5 concentration, and CO2 concentration; Simultaneously, obtain out-of-hospital public health surveillance data for the city where the hospital is located. The out-of-hospital public health surveillance data should include at least meteorological data and community influenza-like illness surveillance data. By constructing a spatiotemporal fusion model, the above-mentioned in-hospital environmental monitoring data and external public health monitoring data are integrated to generate a comprehensive environmental risk index that represents the combined impact of external environmental pressure and internal environmental quality. The spatiotemporal fusion model adopts a data fusion method based on tensor decomposition. The specific processing procedure is as follows: First, the multi-indicator time series data collected from various monitoring points inside the hospital, the multi-indicator time series data collected from external meteorological stations, and the time series data collected from community influenza-like illness monitoring points are jointly constructed into a three-dimensional tensor. The three dimensions of the three-dimensional tensor represent the time point, spatial location, and monitoring indicator type, respectively. Then, the CANDECOMP / PARAFAC decomposition method is used to decompose the three-dimensional tensor into the product of three factor matrices. The three-dimensional tensor decomposition process can extract potential common structures from the original multi-source data. Finally, by reconstructing the decomposed factor matrix, a standardized comprehensive environmental risk index is obtained. After obtaining the comprehensive environmental risk index, it is converted into a dynamic environmental risk correction coefficient that can be multiplied by the initial risk prediction value; The conversion process is achieved through a nonlinear transformation using a Sigmoid function. The specific calculation process is as follows: First, a baseline value for an environmental risk index is set, which is determined based on the average level of historical in-hospital environmental monitoring data and out-of-hospital public health monitoring data. Then, the difference between the comprehensive environmental risk index and the benchmark value is calculated. This difference is multiplied by a rate of change parameter for adjusting the steepness of the curve and then input into a modified form of the Sigmoid function for processing, so that the output value shows a smooth change around the benchmark value. Finally, by controlling the overall range of change of the correction coefficient through a maximum fluctuation amplitude parameter, the comprehensive environmental risk index is mapped to a dynamic environmental risk correction coefficient that fluctuates within a preset range with 1 as the center.

7. The method for predicting the risk of respiratory hospital-acquired infections according to claim 6, characterized in that: The dynamic calibration of the output of the initial risk prediction model is achieved through the following formula: ； Among them, P 最终 P represents the final risk forecast. 初始 k represents the risk probability output by the initial risk prediction model. 环境 Represents the dynamic environmental risk correction coefficient; Before performing the above multiplication calibration, a confidence assessment step is included to ensure the reliability of the calibration. The specific calculation process for the confidence assessment step is as follows: First, the calibration error of the initial risk prediction model is calculated on the validation set, which is the deviation between the probability predicted by the model and the actual frequency, specifically measured by the expected calibration error index. At the same time, the information entropy of the output probability of the initial risk prediction model is calculated to measure the uncertainty of the prediction result. The higher the information entropy, the greater the uncertainty of the prediction result. Then, the calibration error and information entropy are normalized and weighted to obtain a comprehensive confidence score between 0 and 1. The higher the comprehensive confidence score, the more reliable the prediction result of the initial risk prediction model. When the comprehensive confidence score is lower than the preset confidence threshold, it indicates that the prediction result of the current initial risk prediction model has a deviation. At this time, a model fusion correction process is initiated: the background risk probability output by the seasonal autoregressive moving average model constructed based on the historical infection rate data of the same period is obtained. The risk probability output by the initial risk prediction model is weighted and fused with the background risk probability. The fusion weights of the initial risk prediction model and the seasonal autoregressive moving average model are proportional to their respective comprehensive confidence scores, and the sum of the two fusion weights is 1. The weighted fusion result is used as the calibrated initial risk probability and then substituted into the aforementioned formula for subsequent multiplication calibration. When the overall confidence score is not lower than the preset confidence threshold, the risk probability output by the initial risk prediction model is directly substituted into the dynamic calibration calculation formula of the initial risk prediction model as the calibrated initial risk probability for multiplication calibration. At this time, the calibrated initial risk probability is the risk probability output by the initial risk prediction model itself, that is, the risk prediction value.

8. The method for predicting the risk of respiratory hospital infections according to claim 1, characterized in that: The specific process of generating corresponding risk warning information includes: comparing the final risk prediction value with multiple preset risk thresholds, which are dynamically determined by analyzing the distribution quantiles of historical infection events; When the risk prediction value exceeds the first threshold, a concern-level warning is generated; when it exceeds the second threshold, a warning-level warning is generated. If the second threshold is greater than the first threshold, the hospital infection control system will be linked to automatically push a list of high-risk patients and suggested intervention measures to the terminal devices. After a warning-level alert is triggered, an infection tracing analysis module is automatically activated to assist the hospital's infection control department in quickly locating potential transmission routes. The specific calculation process of the infection source tracing analysis module is as follows: First, based on the time series characteristics of high-risk patients, extract the diagnosis and treatment event sequence of each patient within a period of time before the warning time point. This diagnosis and treatment event sequence is a sequence composed of event tokens. Then, the sequence similarity of patients during the treatment process is quantified by calculating the edit distance of the treatment event sequence between any two patients; The edit distance is calculated by using dynamic programming to determine the minimum number of single-character edit operations required to transform one sequence into another, including insertion, deletion, and replacement. The edit distance is then normalized by dividing the edit distance by the maximum of the lengths of the two sequences to eliminate the impact of differences in sequence length. Based on the normalized edit distance between all patients calculated, its reciprocal is defined as the association strength between patients. A patient similarity network is constructed with patients as nodes and association strength as edge weights. In the patient similarity network, the larger the edge weight, the more similar the diagnosis and treatment process of the two patients is, and the higher the possibility of infection association. Finally, the Louvain community detection algorithm was used to iteratively optimize the constructed patient similarity network. By maximizing the modularity index, the closely connected node clusters in the network were identified, and the node clusters were identified as infection transmission clusters. The information on the identified infection transmission clusters, including the list of patients within the cluster and the similarities between patients within the cluster, will be pushed to the terminal devices of hospital infection control personnel along with risk warning information.

9. A respiratory hospital infection risk prediction system, wherein the system is applied in the method described in any one of claims 1-8, characterized in that: The system includes: The data acquisition module is used to acquire electronic medical record data of the target hospital within a historical time period. The electronic medical record data includes at least patient demographic data, medical record data, laboratory test data, and imaging test data. The feature construction module is used to preprocess electronic medical record data and construct a multi-dimensional original feature set containing basic patient information, diagnosis and treatment process information, and clinical test information. Based on the original feature set, a temporal sequence of patient diagnosis and treatment events is constructed, and a deep neural network model is used to extract the implicit temporal dynamic features of each patient. The temporal dynamic features are used to characterize the changing trend of the patient's respiratory infection risk over time. By integrating static features and temporal dynamic features from the original feature set, a high-dimensional hybrid feature space is constructed. The model building module is used to build an initial risk prediction model in a high-dimensional mixed feature space based on an improved weighted Naive Bayes classifier. The improved weighted Naive Bayes classifier adjusts the contribution of different features to the classification results by introducing feature weight factors. The environmental correction module is used to collect internal environmental monitoring data and external public health monitoring data of the hospital, integrate them to construct a dynamic environmental risk correction coefficient, and dynamically calibrate the output of the initial risk prediction model based on the dynamic environmental risk correction coefficient to generate the final risk prediction value. The early warning generation module is used to classify the risk level of respiratory infection in different areas or different patient groups within the hospital based on the final risk prediction value, and generate corresponding risk warning information.