DRG grouping method, device and medium based on semantic information fusion
By employing a DRG grouping method based on semantic information fusion, and utilizing a deep neural network model and cross-modal attention mechanism, the accuracy and efficiency issues of DRG grouping in existing technologies are resolved, achieving efficient and accurate grouping results and rational use of resources.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- RENYUN HLDG CO LTD
- Filing Date
- 2026-03-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing DRG grouping technology struggles to achieve comprehensive rule coverage when dealing with complex cases, leading to biased grouping results, low computational efficiency, inability to quickly adapt to iterative grouping standards in different regions, and a lack of recognition of deep semantic connections between clinical diagnosis and surgical procedures, resulting in grouping errors and wasted resources.
The DRG grouping method based on semantic information fusion is adopted. Structured and unstructured data are retrieved through the data interface, and cleaning and encoding mapping are performed. Feature operations are performed using a deep neural network model, and semantic associations are captured by combining a cross-modal attention mechanism to generate a global semantic representation vector. The automatic or manual review instructions are determined by a confidence threshold.
It achieves efficient and accurate DRG grouping, improves grouping accuracy, reduces processing time, adapts to different regional standard iterations, ensures data security and efficient integration, provides visual feedback, and improves grouping efficiency and the rational use of medical resources.
Smart Images

Figure CN122245584A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent medical data processing technology, specifically to a DRG grouping method, device, and medium based on semantic information fusion. Background Technology
[0002] With the deepening of healthcare system reforms globally, the scientific and refined management of medical insurance payment methods has become a core issue in the construction of healthcare systems in various countries. Among these, Diagnosis Related Groups (DRGs), recognized as an advanced management tool, group cases with similar clinical characteristics and resource consumption into the same group by comprehensively considering factors such as disease severity, prognosis, treatment methods, and resource consumption, thus enabling bundled payment of medical expenses. This mechanism helps standardize medical service behavior, control unreasonable increases in medical costs, and is an important tool for evaluating the performance of medical institutions and improving the quality of medical services. Against this backdrop, how to leverage massive amounts of electronic medical record data to achieve rapid, accurate, and objective DRG inclusion has become a crucial aspect of current healthcare informatization.
[0003] In the wave of modern smart hospital construction, the generation of medical data is increasing exponentially, and the data formats are becoming increasingly complex. Complete case data includes structured data such as the patient's age, gender, costs, and length of hospital stay, as well as a large amount of unstructured text data, such as admission records, progress notes, surgical records, and discharge summaries. Unstructured text contains doctors' deep understanding of the patient's condition, differential diagnostic logic, and specific details of surgical procedures, playing a decisive role in determining the primary and secondary relationships of diseases and the severity of complications. However, the current mainstream DRG grouping technology is still at the stage of traditional logical judgment. Faced with increasingly complex clinical realities and the need for massive data processing, its limitations are becoming increasingly apparent, specifically in the following aspects:
[0004] Existing rule-based grouping methods rely heavily on manually preset matching logic, which makes it difficult to achieve comprehensive rule coverage when dealing with complex cases. This can lead to biased grouping results and also makes the maintenance of the rule base extremely labor-intensive and difficult to ensure logical completeness.
[0005] Existing technologies use a serial mechanism to process single cases, resulting in long processing times and low computational efficiency. This cannot meet the actual needs of medical institutions for rapid batch review of massive amounts of medical record data, and it is difficult to support the high standards of data processing timeliness required by modern medical insurance settlement business.
[0006] The differences in grouping standards across regions and their dynamic updates mean that the fixed rule logic faces extremely high adjustment and reconstruction costs when standards change. Furthermore, existing algorithms lack sufficient flexibility and transferability, making it difficult to quickly adapt to the system upgrade needs brought about by the iteration of grouping standards in different regions.
[0007] Existing methods are not good at accurately capturing the deep semantic connections between clinical diagnosis and surgical procedures. This lack of recognition of the inherent logic of medical texts directly leads to errors in grouping results, affecting the accuracy and rationality of medical insurance fund allocation, and causing waste or misallocation of medical resources.
[0008] The internal network architecture of medical institutions is unique and closed, and intelligent grouping applications usually have high requirements for computing resources, which makes the existing single-machine deployment mode unsuitable and hinders the implementation and integration of high-performance artificial intelligence grouping solutions in the actual network environment of hospitals. Summary of the Invention
[0009] To at least partially address the problems existing in the prior art, the present invention provides a DRG grouping method, device, and medium based on semantic information fusion.
[0010] In a first aspect, the DRG grouping method based on semantic information fusion provided by the present invention includes:
[0011] Step 1: Retrieve the original data of the target case through the data interface. The original data includes structured clinical feature data and unstructured medical record text data.
[0012] Step 2: Perform cleaning operations on the original data to remove outliers, map the cleaned structured clinical feature data to standard medical insurance diagnosis codes and surgical codes to generate standard coding vectors, and simultaneously perform word segmentation on the cleaned unstructured medical record text data to generate text sequences.
[0013] Step 3: Call the preset main diagnostic category preliminary grouping rule library to perform feature matching on the standard encoding vector, and determine whether the target case matches the preliminary grouping conditions based on the matching results;
[0014] Step 4: When the target case does not match the initial grouping conditions, the standard encoding vector is used as the query vector to perform cross-modal attention weighting calculation on the text sequence. The calculated weighted text features are concatenated with the standard encoding vector to generate the global semantic representation vector of the target case.
[0015] Step 5: Input the global semantic representation vector into the deep neural network grouping model to perform feature operation, and output the predicted DRG group probability distribution and corresponding confidence score through the output layer of the deep neural network grouping model;
[0016] Step 6: Compare the confidence score with the preset high confidence threshold, and generate an automatic approval instruction or a manual review instruction based on the comparison result.
[0017] Step 7: In response to the automatic approval instruction or the manual assisted review instruction, display the determined DRG grouping recommendation result through the terminal interface.
[0018] Preferably, the deep neural network grouping model adopts the Transformer architecture, and the deep neural network grouping model specifically includes:
[0019] A multi-layer encoder structure is used to receive the global semantic representation vector;
[0020] A multi-head self-attention mechanism module, located inside the multi-layer encoder structure, is used to capture long-distance contextual dependencies in the medical record text data;
[0021] A fully connected classification layer, connected to the output of the multi-layer encoder structure, is used to map the extracted high-dimensional features to the probability distribution of DRG groups.
[0022] Preferably, step one further includes:
[0023] Sub-step 1.1: Establish a communication connection with the hospital information system through the data interface, and use the target case set filtering function. Identify the target cases to be processed; the target case set filtering function. The expression is:
[0024] ,
[0025] in, For the k-th candidate case in the hospital information system, For the admission time of the kth candidate case, The preset start time of the data acquisition time window. The end time of the preset data collection time window. This is the discharge settlement status identifier for the kth candidate case, where 1 indicates a settled status.
[0026] Sub-step 1.2 involves extracting structured clinical feature data and constructing a structured feature set for the selected target cases. The structured feature set The expression is:
[0027] ,
[0028] in, The patient's age is the numerical value. Code the patient's gender. The standard ICD code for the primary diagnosis, For secondary diagnoses, the standard ICD code set, A standard set of ICD codes for surgical procedures. This represents the number of days spent in the hospital. This represents the total cost of hospitalization.
[0029] Sub-step 1.3: Extract unstructured medical record text data of the target case and construct a full text sequence. The full text sequence The expression is:
[0030] ,
[0031] in, For string concatenation operators, This is a paragraph from the admission record text. For the medical record text paragraphs, This is a text paragraph recording the surgical procedure. This is a discharge summary text paragraph;
[0032] Sub-step 1.4: Utilize the data integrity verification function Perform joint validity verification on the structured feature set and the full text sequence; the data integrity verification function The expression is:
[0033] ,
[0034] in, For the structured data integrity weighting coefficient, Here, represents the weighting coefficient for the validity of unstructured data, and N represents the total number of fields in the structured feature set. For the i-th field value in the structured feature set, It is a non-empty indicator function. The length of the entire text sequence in characters. The preset minimum text length threshold, It is a step function;
[0035] When the data integrity verification function The calculation result is greater than the preset threshold. At that time, the original data of the target case is output.
[0036] Preferably, step two further includes:
[0037] Sub-step 2.1 involves performing outlier validation on the structured clinical feature data to generate cleaned structured data. The outlier validation operation is based on a numerical validity determination function. Execution; the numerical validity determination function The expression is:
[0038] ,
[0039] in, For structured clinical feature data, M represents the total number of numerical fields in the structured clinical feature data. For the actual observed value of the j-th numeric field, Let j be the statistical mean of the j-th numeric field in the historical database. Let j be the standard deviation of the j-th numeric field. For indicator functions; when When the value equals 1, the structured clinical feature data is determined to be cleaned structured data;
[0040] Sub-step 2.2: Based on the cleaned structured data, use the standard mapping transformation function. Constructing standard encoding vectors The standard mapping transformation function The expression is:
[0041] ,
[0042] in, The standard medical insurance diagnosis code for the primary diagnosis. This is the standard medical insurance diagnosis code for the r-th primary diagnosis. This is the standard medical insurance surgical code for the k-th surgery. An embedding function that maps discrete codes to fixed-dimensional dense vectors. For vector concatenation operators, R is the total number of secondary diagnoses, and K is the total number of surgical procedures;
[0043] Sub-step 2.3 involves performing word segmentation and stop word filtering on the unstructured medical record text data to generate a text sequence. The text sequence The generation process follows the text filtering set formula:
[0044] ,
[0045] in, This is unstructured medical record text data. This is a Chinese word segmentation function. This refers to the t-th word group obtained after word segmentation. T is a pre-defined medical stop word library, where T is the total number of word units after word segmentation.
[0046] Preferably, step three further includes:
[0047] Sub-step 3.1: Based on the structured feature set and standard encoding vector, utilize a multi-dimensional feature matching function set. Calculate neonatal characteristic indicators Infection characteristic indicators and trauma characteristic indicators ;
[0048] The multidimensional feature matching function group The expression is:
[0049] ,
[0050] in, The patient's age is the numerical value. The standard ICD code for the primary diagnosis, For secondary diagnoses, the standard ICD code set, A standard set of ICD codes for surgical procedures. The preset threshold for determining the age of newborns. For a pre-defined set of diagnostic codes for human immunodeficiency virus, For a pre-defined set of codes for severe trauma surgeries, The function for calculating the cardinality of a set. The preset threshold for the number of surgeries. It is an indicator function;
[0051] Sub-step 3.2: Use logical disjunction to extract aggregate functions. A fusion calculation is performed on the neonatal characteristic indicators, the infection characteristic indicators, and the trauma characteristic indicators to generate preliminary grouping hit labels. The logical disjunction aggregation function The expression is:
[0052] ,
[0053] in, This is a unit step function. When the input variable is greater than 0, it outputs the value 1; otherwise, it outputs the value 0. The value 1 represents a hit and the value 0 represents a miss.
[0054] Sub-step 3.3: Based on the prior grouping hit identifier, perform a conditional branch judgment operation. When the prior grouping hit identifier is equal to the value 1, determine that the target case hits the prior grouping condition, and directly output the determined DRG group code according to the hit feature index type; when the prior grouping hit identifier is equal to the value 0, determine that the target case does not hit the prior grouping condition, and trigger the cross-modal attention weighted calculation step.
[0055] Preferably, step four further includes:
[0056] Sub-step 4.1: Based on the standard encoded vector and text sequence, utilize the linear projection transformation group. Constructing a query matrix Key matrix and value matrix The linear projection transformation group The expression is:
[0057] ,
[0058] in, For standard encoded vectors, It is a text sequence. For word embedding vectorization function, The weight matrix is used to map structured encoding to the semantic query space. The weight matrix is used to map text features to the semantic index space. The weight matrix maps text features to the semantic content space;
[0059] Sub-step 4.2 involves performing semantic relevance calculation using the query matrix, the key matrix, and the value matrix, through a scaling dot product attention function. Generate context semantic vectors The scaled dot product attention function The expression is:
[0060] ,
[0061] in, Let be the transpose of the key matrix. The feature dimension of the key matrix, Scaling factor For normalized exponential functions, The context semantic vector is a mask matrix used to mask padding characters. Represents weighted text features under the attention of standard encoding vectors;
[0062] Sub-step 4.3, utilizing the feature fusion function Perform a deep fusion operation on the context semantic vector and the standard encoding vector to generate a global semantic representation vector. The feature fusion function The expression is:
[0063] ,
[0064] in, For vector concatenation operators, To concatenate the standard encoding vector with the context semantic vector along the feature dimension, For the feature fusion weight matrix, For bias vectors, It is a linear rectification activation function.
[0065] Preferably, step five further includes:
[0066] Sub-step 5.1 involves inputting the global semantic representation vector into the deep neural network grouping model and utilizing the deep feature mapping function. Perform hierarchical feature extraction and transformation to generate an unnormalized classification log odds vector. The deep feature mapping function The expression is:
[0067] ,
[0068] in, This is the global semantic representation vector. Let L be the nonlinear transformation function of the l-th layer in the deep neural network grouping model, where L is the total number of layers. To output the weight matrix of the classification layer, This is the bias vector for the output classification layer;
[0069] Sub-step 5.2, using the probability distribution normalization function Perform exponential normalization on the categorical log odds vector to generate the DRG group probability distribution. The probability distribution normalization function The expression is:
[0070] ,
[0071] in, Let be the predicted probability value of the i-th DRG group in the DRG group probability distribution. Let be the value of the i-th component in the classification log odds vector, C be the total number of categories in the preset DRG group, and e be the base of the natural logarithm.
[0072] Sub-step 5.3 uses the maximum likelihood evaluation function. Decision calculations are performed on the probability distribution of the DRG groups to extract the predicted DRG group codes and confidence scores. The maximum likelihood evaluation function The expression is:
[0073] ,
[0074] in, It is a function with maximum value. This is the index function for the maximum value of the independent variable. To predict DRG group codes, This is a mapping function for indexes to group codes.
[0075] Preferably, step six further includes:
[0076] Sub-step 6.1: Based on the confidence score and the preset high confidence threshold, use the automatic pass / fail decision function. Calculate the audit status indicator The automatic pass / fail decision function The expression is:
[0077] ,
[0078] in, The confidence score is... The preset high confidence threshold, This is an indicator function. It outputs the value 1 when the inequality condition in the parentheses is true, and outputs the value 0 otherwise. The value 1 indicates the automatic pass status, and the value 0 indicates the manual review status.
[0079] Sub-step 6.2: For cases where the review status indicator is 0, use the candidate group filtering function. Extracting a set of potential candidates from the probability distribution of DRG groups The candidate group selection function The expression is:
[0080] ,
[0081] in, The probability distribution of DRG groups. This is a preorder index extraction function used to extract the probability values ranked first. A set of group indices for bits. This represents the preset number of recommended candidates. The code for the r-th candidate group. Let r be the probability value of the candidate group.
[0082] Sub-step 6.3: Construct a function using instructions. The final control command is generated based on the audit status indicator and the set of suspected candidates. The instruction construction function The expression is:
[0083] ,
[0084] in, To automatically generate operators via instructions, used to generate predicted DRG group codes. Encapsulated as an automatic pass instruction. An operator is generated for the manual review instruction, which is used to encapsulate the suspected candidate set into a manual review instruction. The automatic approval instruction and the manual review instruction are the generated instruction results.
[0085] In a second aspect, this application provides an electronic device, including: a processor; and a memory storing program instructions that, when executed by the processor, cause the electronic device to implement one or more embodiments of the first aspect described above.
[0086] In a third aspect, this application provides a computer-readable storage medium having computer-readable instructions stored thereon, which, when executed by one or more processors, implement one or more embodiments of the first aspect described above.
[0087] This application directly retrieves the full amount of original data containing structured features and unstructured text through a data interface, which can break the data silo effect within the hospital information system, ensure that group calculations are based on complete and multi-dimensional medical record information, avoid grouping deviations caused by manual entry omissions or data gaps, and lay a solid data foundation for achieving high-precision intelligent grouping.
[0088] This application effectively eliminates noise interference and format differences in clinical data by performing strict outlier removal and standardized coding mapping on the original case data. It transforms unstructured medical record text into a computer-understandable standardized sequence, significantly improving data quality and usability, and ensuring the accuracy of the feature extraction process and the stability of the algorithm model.
[0089] This application utilizes a pre-defined rule base for major diagnostic categories to perform rapid feature matching on standardized coding vectors. It can quickly identify and triage typical cases that conform to clear rules before entering deep computation, significantly reducing the computational load and resource consumption of complex neural network models. Through a hierarchical processing mechanism that combines rules and models, it significantly improves the overall grouping efficiency and response speed of the system.
[0090] This application employs a cross-modal attention mechanism to perform deep semantic interaction computation on difficult cases that have not met the initial rules. It uses standard encoding as an index to accurately focus on the descriptive information in the medical record text, solving the semantic loss problem caused by relying solely on encoded information. It deeply mines the implicit logical relationship between diagnosis and treatment and constructs a global semantic representation containing rich contextual information.
[0091] This application inputs the fused global semantic representation into a deep neural network grouping model for feature computation. It utilizes the powerful nonlinear fitting capability of deep architecture to handle highly complex medical feature relationships and outputs prediction results containing probability distribution and confidence scores. This overcomes the limitations of traditional linear logic in handling complex diseases and significantly improves the accuracy and reliability of grouping difficult cases.
[0092] This application generates differentiated control instructions based on the comparison results of confidence scores and high confidence thresholds, and constructs a dual verification mechanism that combines automatic approval with manual review. This ensures rapid settlement of high-certainty cases while accurately intercepting low-confidence-risk cases and transferring them to manual review, effectively balancing grouping efficiency with medical fund payment security and reducing the risk of misgrouping.
[0093] This application provides clear decision support and visual feedback to medical insurance coders and reviewers by responding to automatic approval or manual review instructions and intuitively displaying the determined DRG grouping recommendations through a terminal interface. This enables seamless transformation of algorithmic decision results into actual business processes, improving the operational convenience and human-computer interaction experience of medical institutions' medical record management.
[0094] Compared with the prior art, the present invention has the following advantages:
[0095] 1. This invention adopts a deep neural network model to replace the traditional manual rule logic, achieving the technical effect of automatically extracting case features and performing intelligent grouping prediction. It realizes a high degree of automation and accuracy in handling complex cases, and solves the shortcomings of existing rule-based grouping methods that rely heavily on manual presets, have incomplete logic coverage, and require a huge amount of maintenance.
[0096] 2. This invention adopts a diversion technology solution that combines a pre-set major diagnostic category grouping rule base for rapid matching with deep model calculation. This achieves the technical effect of significantly reducing the processing time of a single case and increasing the system throughput, enabling second-level rapid review and settlement of massive medical record data. It solves the shortcomings of the existing serial processing mechanism, which has low computational efficiency and cannot meet the high standards of timeliness requirements of medical insurance data.
[0097] 3. This invention adopts an iteratively updatable deep learning model architecture, which can quickly adapt to the dynamically changing grouping standards of different regions with the need for parameter fine-tuning. This enables the system to be flexibly migrated and upgraded at low cost between multiple regional standards, solving the problem that existing fixed rule logic is costly to adjust and reconstruct when standards change and lacks flexibility.
[0098] 4. This invention adopts a cross-modal attention mechanism to deeply integrate structured coding and unstructured text data, achieving the technical effect of accurately capturing the deep semantic relationship between clinical diagnosis and surgical operation, greatly improving the accuracy of grouping and the accurate allocation of medical insurance funds, and solving the shortcomings of existing methods such as large grouping errors and resource mismatch caused by the lack of text logic recognition.
[0099] 5. This invention adopts a distributed processing technology solution that separates data acquisition and group computing, achieving the technical effect of flexibly calling high-performance computing power while ensuring the data security of the hospital's intranet. It realizes the effective integration and implementation of the intelligent grouping system in the hospital's closed network environment, and solves the shortcomings of the existing single-machine deployment mode, which has limited computing power and cannot adapt to special network architectures. Attached Figure Description
[0100] Figure 1 This is a schematic diagram of the overall process of the DRG grouping method based on semantic information fusion of the present invention;
[0101] Figure 2 This is a schematic diagram of the matching logic of the preliminary grouping rule base for the main diagnostic categories in the DRG grouping method based on semantic information fusion provided in this embodiment of the invention;
[0102] Figure 3 This is a flowchart of the output decision and instruction generation process based on confidence score in the DRG grouping method based on semantic information fusion provided in the embodiments of the present invention. Detailed Implementation
[0103] To enable those skilled in the art to understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are some, but not all, of the embodiments of the present invention. Other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort should fall within the scope of protection of the present invention.
[0104] The present invention will now be described in detail with reference to the accompanying drawings:
[0105] Example 1:
[0106] Please see the appendix Figure 1-3 This embodiment demonstrates a deep neural network grouping model based on the Transformer architecture, and the final judgment result is a case of automatic passage with high confidence.
[0107] Step 1: Establish a communication connection with the hospital information system through a data interface; using a target case set filtering function, set the start and end times of the data collection time window, and traverse the candidate cases in the system; automatically check the discharge settlement status of each case, and when the status is "settled" and the admission time is within the time window, it is identified as a target case to be processed. Then, extract the structured clinical feature data of the case to construct a structured feature set, and extract unstructured text and concatenate it to generate a full text sequence. Finally, use a data integrity verification function to calculate the integrity score; when the score exceeds a preset threshold, output the original data.
[0108] Step Two: The system cleans the raw data. First, using a numerical validity judgment function, based on the statistical mean and standard deviation of the historical database, outlier detection is performed on numerical fields such as age and cost in the structured data to ensure that the data is within the valid range. After verification, a standard mapping transformation function is used to map the standard ICD codes of primary diagnosis, secondary diagnosis, and surgical procedure into dense vectors through an embedding function, and these vectors are then concatenated to generate a standard code vector. Simultaneously, Chinese word segmentation is performed on the unstructured text, and preset medical stop words are removed using a text filtering set formula to generate a cleaned text sequence.
[0109] Step 3: The system calls the primary diagnostic category preliminary grouping rule library. Using a multi-dimensional feature matching function set, it determines whether the patient is a newborn, whether they contain HIV-related diagnostic codes, and whether they contain severe trauma surgery codes; it then uses a logical disjunction aggregation function to fuse these indicators. In this embodiment, it is assumed that the case is a general internal medicine case, and all three feature indicators are 0, resulting in a preliminary grouping hit flag of 0. The system determines that the target case does not match the preliminary grouping conditions and triggers the subsequent deep calculation process.
[0110] Step 4: Using a linear projection transformation set, the standard encoding vector is mapped to a query matrix, and the text sequence is mapped to a key matrix and a value matrix. Using a scaled dot product attention function, the similarity between the query matrix and the key matrix is calculated, attention weights are generated and applied to the value matrix to obtain a context semantic vector. This vector represents the truly crucial semantic information in the medical record text under diagnostic coding attention. Finally, using a feature fusion function, the standard encoding vector and the context semantic vector are concatenated and activated using ReLU to generate a global semantic representation vector.
[0111] Step 5: Input the global semantic representation vector into a deep neural network grouping model using the Transformer architecture. This model contains a multi-layer encoder structure, and its internal multi-head self-attention mechanism captures long-distance dependencies in the medical record text features. Through hierarchical transformation of the deep feature mapping function, a classification log-odds vector is generated. Then, the probability distribution normalization function is used to calculate the DRG group probability distribution. Using the maximum likelihood evaluation function, the group code with the highest probability value is extracted as the prediction result, which is used as the confidence score.
[0112] Step Six: The system uses an automatic pass / fail function to compare the confidence score with a preset high confidence threshold. In this embodiment, because the case characteristics are typical, the confidence score predicted by the model is higher than the threshold, and the function outputs a review status indicator of value 1. Using the instruction construction function, the system generates an automatic pass instruction containing the predicted group code;
[0113] Step 7: The terminal interface responds to the automatic approval command, directly displays the confirmed DRG grouping result, marks it as automatically approved by the system without manual intervention, and completes the grouping process for this case.
[0114] Example 2:
[0115] Please see the appendix Figure 1-3 This embodiment demonstrates a grouping model based on a hybrid architecture of convolutional neural networks and long short-term memory networks, and the final judgment result is a low-confidence manual review scenario.
[0116] Step 1: Establish a communication connection with the hospital information system through a data interface; using a target case set filtering function, set the start and end times of the data collection time window, and traverse the candidate cases in the system; automatically check the discharge settlement status of each case, and when the status is "settled" and the admission time is within the time window, it is identified as a target case to be processed. Then, extract the structured clinical feature data of the case to construct a structured feature set, and extract unstructured text and concatenate it to generate a full text sequence. Finally, use a data integrity verification function to calculate the integrity score; when the score exceeds a preset threshold, output the original data.
[0117] Step Two: Perform the same cleaning logic as in Example One on this complex case. It is worth noting that the text sequence for this case is quite long, containing extensive descriptions of disease management and complex surgical procedures;
[0118] Step 3: The system calculates the characteristic indicators of newborns, infections, and trauma. In this embodiment, although the condition is complex, it does not match the specific pre-grouping rule, so the pre-grouping hit indicator is still 0, and the process proceeds to the deep learning model calculation stage;
[0119] Step 4: The system uses the standard encoding vector as the query vector to "search" for relevant semantic features in complex text sequences and generate a global semantic representation vector;
[0120] Step 5: Input the global semantic representation vector into a hybrid architecture deep neural network grouping model. A one-dimensional convolutional neural network layer slides across the vector sequence to extract local keyword features; then, a long short-term memory network layer receives the output of the convolutional neural network layer and processes the evolution features of the text sequence over time. The output features of both are fused in a feature concatenation and classification layer. The DRG group probability distribution is then calculated. Due to the complexity of the cases, the highest predicted probability value is relatively low, i.e., the confidence score is not high.
[0121] Step Six: The system compares the confidence score with a preset high confidence threshold. In this embodiment, if the score is lower than the threshold, the system automatically outputs a review status flag of 0 through a decision function. The system triggers a candidate group filtering function to extract the group codes and corresponding probabilities of the top K probability values from the probability distribution, constructing a suspected candidate set. The subsequent instruction construction function encapsulates this set into a manual review instruction.
[0122] Step 7: The terminal interface responds to the manual review instruction, displaying a pop-up message "Manual review required." The interface shows the model's predicted first-choice DRG group, and simultaneously lists other groups from the suspected candidate set for reviewers' reference. Reviewers, combining the cleaned medical record text and highlighted key features displayed on the interface, select the most suitable group from the recommended list to complete the final grouping.
[0123] Embodiments of the present invention have been presented and described. It will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A DRG grouping method based on semantic information fusion, characterized in that, include: Step 1: Retrieve the original data of the target case through the data interface. The original data includes structured clinical feature data and unstructured medical record text data. Step 2: Perform cleaning operations on the original data to remove outliers, map the cleaned structured clinical feature data to standard medical insurance diagnosis codes and surgical codes to generate standard coding vectors, and simultaneously perform word segmentation on the cleaned unstructured medical record text data to generate text sequences. Step 3: Call the preset main diagnostic category preliminary grouping rule library to perform feature matching on the standard encoding vector, and determine whether the target case matches the preliminary grouping conditions based on the matching results; Step 4: When the target case does not match the initial grouping conditions, the standard encoding vector is used as the query vector to perform cross-modal attention weighting calculation on the text sequence. The calculated weighted text features are concatenated with the standard encoding vector to generate the global semantic representation vector of the target case. Step 5: Input the global semantic representation vector into the deep neural network grouping model to perform feature operation, and output the predicted DRG group probability distribution and corresponding confidence score through the output layer of the deep neural network grouping model; Step 6: Compare the confidence score with the preset high confidence threshold, and generate an automatic approval instruction or a manual review instruction based on the comparison result. Step 7: In response to the automatic approval instruction or the manual assisted review instruction, display the determined DRG grouping recommendation result through the terminal interface.
2. The DRG grouping method according to claim 1, characterized in that, The deep neural network grouping model adopts the Transformer architecture, and the deep neural network grouping model specifically includes: A multi-layer encoder structure is used to receive the global semantic representation vector; A multi-head self-attention mechanism module, located inside the multi-layer encoder structure, is used to capture long-distance contextual dependencies in the medical record text data; A fully connected classification layer, connected to the output of the multi-layer encoder structure, is used to map the extracted high-dimensional features to the probability distribution of DRG groups.
3. The DRG grouping method according to claim 1, characterized in that, Step one further includes: Sub-step 1.1: Establish a communication connection with the hospital information system through the data interface, and use the target case set filtering function. Identify the target cases to be processed; the target case set filtering function. The expression is: , in, For the k-th candidate case in the hospital information system, For the admission time of the kth candidate case, The preset start time of the data acquisition time window. The end time of the preset data collection time window. This is the discharge settlement status identifier for the kth candidate case, where 1 indicates a settled status. Sub-step 1.2 involves extracting structured clinical feature data and constructing a structured feature set for the selected target cases. The structured feature set The expression is: , in, The patient's age is the numerical value. Code the patient's gender. The standard ICD code for the primary diagnosis, For secondary diagnoses, the standard ICD code set, A standard set of ICD codes for surgical procedures. This represents the number of days spent in the hospital. This represents the total cost of hospitalization. Sub-step 1.3: Extract unstructured medical record text data of the target case and construct a full text sequence. The full text sequence The expression is: , in, For string concatenation operators, This is a paragraph from the admission record text. For the medical record text paragraphs, This is a text paragraph recording the surgical procedure. This is a discharge summary text paragraph; Sub-step 1.4: Utilize the data integrity verification function Perform joint validity verification on the structured feature set and the full text sequence; the data integrity verification function The expression is: , in, For the structured data integrity weighting coefficient, Here, represents the weighting coefficient for the validity of unstructured data, and N represents the total number of fields in the structured feature set. For the i-th field value in the structured feature set, It is a non-empty indicator function. The length of the entire text sequence in characters. The preset minimum text length threshold, It is a step function; When the data integrity verification function The calculation result is greater than the preset threshold. At that time, the original data of the target case is output.
4. The DRG grouping method according to claim 1, characterized in that, Step two further includes: Sub-step 2.1 involves performing outlier validation on the structured clinical feature data to generate cleaned structured data. The outlier validation operation is based on a numerical validity determination function. Execution; the numerical validity determination function The expression is: , in, For structured clinical feature data, M represents the total number of numerical fields in the structured clinical feature data. For the actual observed value of the j-th numeric field, Let j be the statistical mean of the j-th numeric field in the historical database. Let j be the standard deviation of the j-th numeric field. For indicator functions; when When the value equals 1, the structured clinical feature data is determined to be cleaned structured data; Sub-step 2.2: Based on the cleaned structured data, use the standard mapping transformation function. Constructing standard encoding vectors The standard mapping transformation function The expression is: , in, The standard medical insurance diagnosis code for the primary diagnosis. This is the standard medical insurance diagnosis code for the r-th primary diagnosis. This is the standard medical insurance surgical code for the k-th surgery. An embedding function that maps discrete codes to fixed-dimensional dense vectors. For vector concatenation operators, R is the total number of secondary diagnoses, and K is the total number of surgical procedures; Sub-step 2.3 involves performing word segmentation and stop word filtering on the unstructured medical record text data to generate a text sequence. The text sequence The generation process follows the text filtering set formula: , in, This is unstructured medical record text data. This is a Chinese word segmentation function. This refers to the t-th word group obtained after word segmentation. T is a pre-defined medical stop word library, where T is the total number of word units after word segmentation.
5. The DRG grouping method according to claim 1, characterized in that, Step three further includes: Sub-step 3.1: Based on the structured feature set and standard encoding vector, utilize a multi-dimensional feature matching function set. Calculate neonatal characteristic indicators Infection characteristic indicators and trauma characteristic indicators ; The multidimensional feature matching function group The expression is: , in, The patient's age is the numerical value. The standard ICD code for the primary diagnosis, For secondary diagnoses, the standard ICD code set, A standard set of ICD codes for surgical procedures. The preset threshold for determining the age of newborns. For a pre-defined set of diagnostic codes for human immunodeficiency virus, For a pre-defined set of codes for severe trauma surgeries, The function for calculating the cardinality of a set. The preset threshold for the number of surgeries. It is an indicator function; Sub-step 3.2: Use logical disjunction to extract aggregate functions. A fusion calculation is performed on the neonatal characteristic indicators, the infection characteristic indicators, and the trauma characteristic indicators to generate preliminary grouping hit labels. The logical disjunction aggregation function The expression is: , in, This is a unit step function. When the input variable is greater than 0, it outputs the value 1; otherwise, it outputs the value 0. The value 1 represents a hit and the value 0 represents a miss. Sub-step 3.3: Based on the prior grouping hit identifier, perform a conditional branch judgment operation. When the prior grouping hit identifier is equal to the value 1, determine that the target case hits the prior grouping condition, and directly output the determined DRG group code according to the hit feature index type; when the prior grouping hit identifier is equal to the value 0, determine that the target case does not hit the prior grouping condition, and trigger the cross-modal attention weighted calculation step.
6. The DRG grouping method according to claim 1, characterized in that, Step four further includes: Sub-step 4.1: Based on the standard encoded vector and text sequence, utilize the linear projection transformation group. Constructing a query matrix Key matrix and value matrix The linear projection transformation group The expression is: , in, For standard encoded vectors, It is a text sequence. For word embedding vectorization function, The weight matrix is used to map structured encoding to the semantic query space. The weight matrix is used to map text features to the semantic index space. The weight matrix maps text features to the semantic content space; Sub-step 4.2 involves performing semantic relevance calculation using the query matrix, the key matrix, and the value matrix, through a scaling dot product attention function. Generate context semantic vectors The scaled dot product attention function The expression is: , in, Let be the transpose of the key matrix. The feature dimension of the key matrix, Scaling factor For normalized exponential functions, The context semantic vector is a mask matrix used to mask padding characters. Represents weighted text features under the attention of standard encoding vectors; Sub-step 4.3, utilizing the feature fusion function Perform a deep fusion operation on the context semantic vector and the standard encoding vector to generate a global semantic representation vector. The feature fusion function The expression is: , in, For vector concatenation operators, To concatenate the standard encoding vector with the context semantic vector along the feature dimension, For the feature fusion weight matrix, For bias vectors, It is a linear rectification activation function.
7. The DRG grouping method according to claim 1, characterized in that, Step five further includes: Sub-step 5.1 involves inputting the global semantic representation vector into the deep neural network grouping model and utilizing the deep feature mapping function. Perform hierarchical feature extraction and transformation to generate an unnormalized classification log odds vector. The deep feature mapping function The expression is: , in, This is the global semantic representation vector. Let L be the nonlinear transformation function of the l-th layer in the deep neural network grouping model, where L is the total number of layers. To output the weight matrix of the classification layer, This is the bias vector for the output classification layer; Sub-step 5.2, using the probability distribution normalization function Perform exponential normalization on the categorical log odds vector to generate the DRG group probability distribution. The probability distribution normalization function The expression is: , in, Let be the predicted probability value of the i-th DRG group in the DRG group probability distribution. Let be the value of the i-th component in the classification log odds vector, C be the total number of categories in the preset DRG group, and e be the base of the natural logarithm. Sub-step 5.3 uses the maximum likelihood evaluation function. Decision calculations are performed on the probability distribution of the DRG groups to extract the predicted DRG group codes and confidence scores. The maximum likelihood evaluation function The expression is: , in, It is a function with maximum value. This is the index function for the maximum value of the independent variable. To predict DRG group codes, This is a mapping function for indexes to group codes.
8. The DRG grouping method according to claim 1, characterized in that, Step six further includes: Sub-step 6.1: Based on the confidence score and the preset high confidence threshold, use the automatic pass / fail decision function. Calculate the audit status indicator The automatic pass / fail decision function The expression is: , in, The confidence score is... The preset high confidence threshold, This is an indicator function. It outputs the value 1 when the inequality condition in the parentheses is true, and outputs the value 0 otherwise. The value 1 indicates the automatic pass status, and the value 0 indicates the manual review status. Sub-step 6.2: For cases where the review status indicator is 0, use the candidate group filtering function. Extracting a set of potential candidates from the probability distribution of DRG groups The candidate group selection function The expression is: , in, The probability distribution of DRG groups. This is a preorder index extraction function used to extract the probability values ranked first. A set of group indices for bits. This represents the preset number of recommended candidates. The code for the r-th candidate group. Let r be the probability value of the candidate group. Sub-step 6.3: Construct a function using instructions. The final control command is generated based on the audit status indicator and the set of suspected candidates. The instruction construction function The expression is: , in, To automatically generate operators via instructions, used to generate predicted DRG group codes. Encapsulated as an automatic pass instruction. An operator is generated for the manual review instruction, which is used to encapsulate the suspected candidate set into a manual review instruction. The automatic approval instruction and the manual review instruction are the generated instruction results.
9. An electronic device, characterized in that, include: A processor and a memory, wherein program instructions are stored, which, when executed by the processor, cause the electronic device to perform the method according to any one of claims 1-8.
10. A storage medium, characterized in that, The storage medium is a computer-readable storage medium storing computer-readable instructions that, when executed by one or more processors, implement the method as described in any one of claims 1-8.