Method and apparatus for assisting in inferring cause of death chain

By combining a large language model and an ICD-coded knowledge base, and utilizing N-Gram probability models and Prompt techniques, the system assists in compiling cause-of-death chains, addressing the issue of insufficient professional knowledge among clinicians and improving the accuracy and monitoring efficiency of cause-of-death chains.

CN120450052BActive Publication Date: 2026-06-26BEIJING CENT FOR DISEASE PREVENTION & CONTROL

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING CENT FOR DISEASE PREVENTION & CONTROL
Filing Date
2025-06-03
Publication Date
2026-06-26

Smart Images

  • Figure CN120450052B_ABST
    Figure CN120450052B_ABST
Patent Text Reader

Abstract

The application discloses a cause-of-death chain auxiliary inference method and device, and the method comprises the following steps: acquiring a death investigation record text; generating an ICD coding list associated with a current death event based on a time line according to the death investigation record text; determining a cause-of-death chain compilation rule, a cause-of-death chain compilation case and each cause-of-death chain and its probability related to the current death event according to the ICD coding list; and determining the cause-of-death chain and the root cause of the current death event by using a Prompt prompt large language model in combination with the cause-of-death chain compilation rule, the cause-of-death chain compilation case and the cause-of-death chain and its probability. According to the application, the cause-of-death information extraction and the auxiliary inference of the cause-of-death chain can be intelligently and automatically completed, and the accuracy and efficiency of the cause-of-death monitoring work are improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of cause-of-death monitoring and statistics, and specifically to a method and apparatus for assisting in the deduction of cause-of-death chains. Background Technology

[0002] Mortality registration and reporting, as well as mortality statistics, is a fundamental task involving the continuous and systematic collection and comprehensive analysis of population mortality data to study mortality levels, causes of death, and trends and patterns. Health indicators derived from mortality data analysis, such as life expectancy, maternal mortality rate, and infant mortality rate, are important information reflecting the socio-economic level and cultural development of a country or region, providing a scientific basis for the country to formulate socio-economic development goals and health policies.

[0003] Completeness of residents' death information and accuracy of cause-of-death determination are fundamental to cause-of-death surveillance. Determining the cause of death involves two core issues: ① appropriate and unified classification standards, i.e., what criteria are used to determine the cause of death; ② how is the cause of death determined. When conducting cause-of-death statistics, if only one disease is involved, the classification of causes of death is relatively simple; however, in most cases, death is caused by two or more diseases, while cause-of-death statistics can only select one cause. Therefore, the World Health Organization defines the "root cause" as: (a) the earliest disease or injury in a series of pathological events that directly lead to death, or (b) the accident or violence that causes the fatal injury. A frequently involved concept in determining the root cause of death is "whether there is a reasonable sequence." This sequence, or cause-of-death chain, refers to a series of diseases or injuries recorded on the medical certificate of death that lead to death, which have a logical chronological relationship and can be reasonably explained.

[0004] In mortality statistics, the cause-of-death chain is undoubtedly the most crucial source of information. The cause-of-death chain is a carefully organized and connected sequence of causes of death according to a specific logical order. It clearly outlines the development and inherent laws of diseases, presenting the complex causes of death in an orderly manner.

[0005] High-quality cause-of-death chain compilation has a very high professional threshold. Its smooth progress depends not only on a solid foundation of clinical medical knowledge but also on a precise grasp of the detailed rules and professional guidelines for disease and death coding established by the World Health Organization. However, in reality, most doctors engaged in clinical practice, due to factors such as differences in their daily work priorities and professional training systems, have a relatively weak grasp of the relevant rules and guidelines for disease and death coding. This undoubtedly poses a challenge to accurate and high-quality cause-of-death chain compilation. Furthermore, in practice, a large number of death certificate reports have indeed been found to have defects in the quality of the cause-of-death chain entries. Defective cause-of-death chain entries will directly mislead government decision-making and lead to an unreasonable allocation of health resources. Summary of the Invention

[0006] This invention provides a method and apparatus for assisting in the deduction of cause of death chains, which intelligently and automatically extracts cause of death information and assists in the deduction of cause of death chains, thereby improving the accuracy and efficiency of cause of death monitoring.

[0007] Therefore, the present invention provides the following technical solution:

[0008] A method for assisting in the deduction of cause-of-death chains, the method comprising:

[0009] Obtain the text of the death investigation record;

[0010] Generate a timeline-based list of ICD codes associated with the current death event based on the death investigation record text;

[0011] The cause-of-death chain compilation rules, cause-of-death chain compilation cases, and each cause-of-death chain and its probability are determined based on the ICD coding list.

[0012] By using the Prompt language model in conjunction with the cause-of-death chain compilation rules, the cause-of-death chain compilation cases, and the cause-of-death chain and its probability, the cause-of-death chain and underlying cause of death of the current death event are determined.

[0013] Optionally, generating a timeline-based list of ICD codes associated with the current death event based on the death investigation record text includes:

[0014] Extract medical terms from the death investigation record text;

[0015] Based on the aforementioned medical terminology, a semantic search is performed on the ICD-coded knowledge base to obtain ICD-coded search results.

[0016] Based on the ICD code retrieval results, a timeline-based list of ICD codes associated with the current death event is generated.

[0017] Optionally, generating a timeline-based list of ICD codes associated with the current death event based on the death investigation record text further includes:

[0018] By combining the death investigation record text with the ICD encoding retrieval results using the prompt word engineering-driven large model, the optimal ICD encoding is determined;

[0019] The process of generating a timeline-based list of ICD codes associated with the current death event based on the ICD code retrieval results includes:

[0020] Generate a timeline-based list of ICD codes associated with the current death event based on the optimal ICD codes.

[0021] Optionally, the medical terms include the following information: disease diagnosis, symptoms and signs.

[0022] Optionally, determining the cause-of-death chain compilation rules, cause-of-death chain compilation cases, and each cause-of-death chain and its probability related to the current death event based on the ICD coding list includes:

[0023] Based on the category of the ICD code in the ICD code list, the cause-of-death chain compilation rule base is retrieved to determine the cause-of-death chain compilation rule related to the current death event.

[0024] Based on the ICD codes in the ICD code list, retrieve the cause-of-death chain compilation case library to determine the cause-of-death chain compilation case related to the current death event;

[0025] The ICD code list is input into the N-Gram probability model to determine each cause-of-death chain and its probability associated with the current death event.

[0026] Optionally, the step of using the Prompt language model in conjunction with the cause-of-death chain compilation rules, the cause-of-death chain compilation cases, and the cause-of-death chain and its probability to determine the cause-of-death chain and underlying cause of death for the current death event includes:

[0027] Configure the role and inference rules of the Prompt prompt large language model;

[0028] According to the set reasoning rules, the cause-of-death chain compilation rules, the cause-of-death chain compilation cases, and the cause-of-death chain and its probability are input into the Prompt prompting language model. Based on the reasoning of the Prompt prompting language model, the cause-of-death chain and the root cause of death of the current death event are obtained.

[0029] Optionally, the reasoning rules include: input information and input order, processing flow, output requirements, and constraints.

[0030] Optionally, the method further includes:

[0031] To determine the accuracy of the cause-of-death chain and the underlying cause of death in the current death event;

[0032] If inaccurate, the chain of causes of death and the underlying cause of death shall be corrected;

[0033] The corrected cause-of-death chain and underlying cause of death are added to the cause-of-death chain compilation case library.

[0034] A device for assisting in the deduction of cause of death chains, the device comprising:

[0035] The receiving module is used to acquire the text of the death investigation record;

[0036] The ICD code list generation module is used to generate a timeline-based ICD code list associated with the current death event based on the death investigation record text.

[0037] The information determination module is used to determine the cause-of-death chain compilation rules, cause-of-death chain compilation cases, and each cause-of-death chain and its probability related to the current death event based on the ICD coding list.

[0038] The comprehensive inference module is used to determine the cause of death and the underlying cause of death of the current death event by combining the Prompt prompting large language model with the cause of death chain compilation rules, the cause of death chain compilation cases, and the cause of death chain and its probability.

[0039] Optionally, the ICD encoding list generation module includes:

[0040] A terminology extraction unit is used to extract medical terms from the death investigation record text;

[0041] The retrieval unit is used to perform semantic retrieval on the ICD-coded knowledge base based on the medical terms, and obtain ICD-coded retrieval results.

[0042] The list generation unit is used to generate a timeline-based list of ICD codes associated with the current death event based on the ICD code retrieval results.

[0043] Optionally, the ICD encoding list generation module further includes:

[0044] The filtering unit is used to determine the optimal ICD code by combining the death investigation record text and the ICD code retrieval results with the prompt word engineering-driven large model;

[0045] The list generation unit generates a timeline-based list of ICD codes associated with the current death event based on the optimal ICD code.

[0046] Optionally, the information determination module includes:

[0047] The rule determination unit is used to retrieve the cause-of-death chain compilation rule base according to the category of the ICD code in the ICD code list, and determine the cause-of-death chain compilation rule related to the current death event;

[0048] The case determination unit is used to retrieve the cause-of-death chain compilation case library based on the ICD codes in the ICD code list and determine the cause-of-death chain compilation case related to the current death event.

[0049] The probability determination unit is used to input the ICD encoding list into the N-Gram probability model to determine each cause-of-death chain and its probability related to the current death event.

[0050] Optionally, the comprehensive inference module includes:

[0051] The setting unit is used to set the role and inference rules of the Prompt prompt large language model;

[0052] The reasoning unit is used to input the cause-of-death chain compilation rules, the cause-of-death chain compilation cases, and the cause-of-death chain and its probability into the Prompt prompting language model according to the set reasoning rules, and to obtain the cause-of-death chain and the underlying cause of death of the current death event based on the reasoning of the Prompt prompting language model.

[0053] The present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the cause-of-death chain auxiliary deduction method.

[0054] The present invention provides a method and apparatus for assisting in the deduction of cause-of-death chains. Based on the diagnostic and treatment information filled in the deceased's inpatient medical records and the key textual information contained in the death investigation records, it comprehensively utilizes technologies such as pre-trained large language models, thought chains, knowledge base retrieval enhancement generation, and N-Gram probability models to intelligently extract relevant disease ICD (International Classification of Diseases) codes from the textual materials of the deceased's inpatient medical records and death investigation records, describing a series of diseases, pathological conditions or injuries that lead to or promote death, as well as accidents or violence that cause such injuries. It then infers the logical relationships and causal order between the codes to provide possible cause-of-death chains and the underlying cause of death.

[0055] Compared with existing technologies, this invention utilizes the medical descriptions of death events from clinical practitioners or death investigators, and combines the World Health Organization's rules and guidelines on disease and death coding with the prior probabilities of various diseases in the cause-of-death chain obtained from historical cause-of-death chain sample data. This effectively improves the accuracy of intelligently generated cause-of-death chains, enhances the quality of cause-of-death monitoring data, and improves the precision and efficiency of cause-of-death monitoring work. Attached Figure Description

[0056] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly described below. Obviously, the drawings described below are merely some embodiments of the present invention, and those skilled in the art can obtain other drawings based on these drawings without creative effort.

[0057] Figure 1 This is a basic format diagram of the cause-of-death chain in the "Medical Certificate of Resident Death";

[0058] Figure 2 This is a flowchart of a cause-of-death chain-assisted deduction method provided in an embodiment of the present invention;

[0059] Figure 3 This is a flowchart illustrating the generation of an ICD code list based on death investigation record text in an embodiment of the present invention;

[0060] Figure 4 This is a schematic diagram of a cause-of-death chain auxiliary inference device provided in an embodiment of the present invention. Detailed Implementation

[0061] The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for illustration and explanation only and are not intended to limit the present invention.

[0062] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0063] The chain of causes of death is the core information basis for cause-of-death statistics. The basic format of the chain of causes of death is as follows: Figure 1 As shown. From this basic format, it can be seen that it can be divided into three parts:

[0064] ① Part I: Cause of Death: This is the main content of the basic format. It requires you to fill in the disease that caused the death and earlier causes. This is a mandatory part. The doctor needs to describe the sequence of diseases that caused the person's death.

[0065] ② Part II of Cause of Death: This is a supplement to Part I. Fill in other meaningful circumstances that contributed to death but were not related to the disease or condition that caused death.

[0066] ③ The approximate time interval (in minutes, hours, days, weeks, months, or years) from the onset of each reported symptom or condition to death can help determine the relationship between various diseases.

[0067] When confirming the logical order of the contents in Part I of the death certificate, the information provided by the clinician should be taken into full consideration. Physicians need to fully understand and learn the requirements for completing the death certificate in order to ensure that their entries are presented in a logical sequence, i.e., a reasonable chain of causes of death.

[0068] However, in actual work, we often encounter situations where "a reasonable order exists", "more than one order exists", and "no order exists". Examples are given below.

[0069] Example 1: There is a reasonable order, as shown in Table 1 below.

[0070] Table 1

[0071]

[0072] Example 2: There are more than one possible order, as shown in Table 2 below.

[0073] Table 2

[0074]

[0075]

[0076] Example 3: There is no order, as shown in Table 3 below.

[0077] Table 3

[0078]

[0079] The aforementioned circumstances present challenges to the accurate determination of cause-of-death chains. To address this, this invention provides a method and apparatus for assisting in the deduction of cause-of-death chains. Based on the diagnostic and treatment information reported in the deceased's inpatient medical records and the key textual information contained in the death investigation records, this method comprehensively utilizes pre-trained large language models, thought chains, knowledge base retrieval enhancement generation, N-Gram probability models, and other technologies. It intelligently extracts relevant disease ICD codes from the textual materials of the deceased's inpatient medical record homepage and death investigation records, describing a series of diseases, pathological conditions, or losses that lead to or contribute to death, as well as accidents or violence that cause such injuries. It then infers the logical relationships and causal order between these codes, providing possible cause-of-death chains and the underlying cause of death. This method can be used to assist medical and health institutions and disease prevention and control departments in reporting and deducing cause-of-death chains, improving the quality of cause-of-death monitoring data.

[0080] like Figure 2 The diagram shown is a flowchart of a cause-of-death chain-assisted deduction method provided in an embodiment of the present invention, which includes the following steps:

[0081] In step 201, the death investigation record text is obtained.

[0082] In step 202, a timeline-based list of ICD codes associated with the current death event is generated based on the death investigation record text.

[0083] The process of generating the ICD encoding list is as follows: Figure 3As shown, it includes the following steps:

[0084] Step 301: Extract medical terms from the death investigation record text.

[0085] Compared to traditional NLP (Natural Language Processing), large language models rely on the advanced Transformer architecture (an artificial intelligence model mainly used for understanding and generating language), which has the outstanding ability to perform unsupervised learning on massive amounts of text. It can learn the subtle and crucial long-distance dependencies within the language, achieving in-depth interpretation of the text.

[0086] Therefore, in this embodiment of the invention, a pre-trained large language model can be used to extract medical terms from death investigation record texts. For example, an industry-leading large language model that has undergone large-scale, multi-domain text pre-training can be selected to ensure it has a broad and solid foundation of language knowledge. For instance, the Tongyi Qianwen Large Language Model has been pre-trained on billions of texts covering multiple fields such as medicine and science, accumulating rich experience in semantic understanding. The text to be interpreted, such as diagnostic and treatment information filled in hospital medical records, and death investigation records, is input into the large language model. The large language model utilizes its deep understanding of language to analyze the text structure sentence by sentence, identifying grammatical components such as subject, verb, object, attributive, adverbial, and complement, and reconstructing the core scenario described in the text. For example, for the medical text "The patient, due to long-term uncontrolled hypertension, recently developed symptoms of heart failure, initially experiencing difficulty breathing, followed by worsening lower limb edema," the large language model can accurately analyze the patient's underlying disease, the resulting secondary symptoms, and the order in which the symptoms appeared, providing a clear framework for subsequent analysis.

[0087] While pre-trained large language models possess powerful text parsing capabilities, enabling them to deeply analyze medical text information and accurately extract a patient's underlying disease, secondary symptoms, and even clarify the sequence of symptom presentation based on textual clues, they also have certain limitations. When faced with highly specialized disease and death coding rules, rigorous inference rules for cause-of-death chains, and the precise probability calculation of determining the order of diseases or conditions leading to death, relying solely on large language models still cannot achieve satisfactory results.

[0088] Therefore, in this embodiment of the invention, a pre-trained large language model can be used to extract medical terms from the death investigation record text to obtain all disease-related medical terms recorded in the death investigation record text. Moreover, these medical terms are arranged based on a time chain and have a sequential relationship.

[0089] Step 302: Perform semantic retrieval on the ICD coding knowledge base based on the medical terminology to obtain ICD coding retrieval results.

[0090] ICD (International Classification of Diseases) is an internationally standardized method for classifying diseases and health problems, developed by the WHO (World Health Organization). It categorizes diseases based on their etiology, pathology, clinical manifestations, and anatomical location, creating an ordered group and representing it using a coding system.

[0091] The ICD coding knowledge base can be a corresponding database provided by the WHO query platform or the national medical insurance information business coding standard data, and can be queried through the query interface of the corresponding platform or through ICD disease coding query tools, etc.

[0092] Step 303: Generate a timeline-based list of ICD codes associated with the current death event based on the ICD code retrieval results.

[0093] To better ensure the accuracy of the generated ICD code list, in some embodiments, the optimal ICD code can be determined by combining the death investigation record text and the ICD code retrieval results with the prompt word engineering-driven large model; then, a timeline-based ICD code list associated with the current death event can be generated based on the optimal ICD code.

[0094] Cue word engineering is a key technology in natural language processing. It involves designing and constructing input prompts to effectively guide large language models to generate the desired output. The large model driven by the cue word engineering can be a reasoning large language model (Reasoning LLMs). It should be noted that the solution of this invention does not depend on a specific large language model; any large language model with reasoning capabilities is acceptable.

[0095] For example, in a non-limiting embodiment, the prompt word is set as follows:

[0096] #Character Setting

[0097] You are a professional medical coding review expert, required to accurately code death certificates based on WHO ICD-10 coding rules. You possess forensic pathology knowledge and clinical diagnostic reasoning abilities.

[0098] #Processing flow

[0099] Please process strictly in the following order:

[0100] 1. Timeline Reconstruction

[0101] When analyzing death investigation records, a structured timeline must be established:

[0102] - Arrange all key medical events in chronological order

[0103] - Mark the occurrence time of each event (accurate to day / hour).

[0104] - Identify the causal chain between events

[0105] - Mark outlier points in the time interval (e.g., sudden deterioration).

[0106] 2. Dynamic encoding matching

[0107] Based on the candidate coding list, execute:

[0108] a) Time-series mapping: Associating the event at each time point with the corresponding ICD code.

[0109] b) Causal verification: Confirm that the codes conform to the ICD causal chain rule (e.g., infection → sepsis → MODS).

[0110] c) Stage marking: Distinguishing the encoding sets corresponding to initiating events, progress events, and terminal events.

[0111] d) Pathway integrity: Preserving necessary intermediate codes in the course of disease development.

[0112] 3. Decision Output

[0113] Output the data line by line according to the timeline and causal chain relationship. Each line outputs an ICD code and the corresponding time.

[0114] # Input data

[0115] Survey record text:

[0116] Candidate ICD code list:

[0117] #Handling of Special Circumstances

[0118] Special notes are required in the following situations:

[0119] - If poisoning / injury is present, the external cause must also be recorded (add XX code).

[0120] - Tumors should be differentiated into primary and secondary sites.

[0121] - Perinatal deaths require the application of special codes from P95-P96.

[0122] - When multiple codes of equal priority exist, "encoding ambiguity" should be marked.

[0123] It should be noted that other processing procedures may be adopted in specific implementations, and the embodiments of the present invention do not limit this.

[0124] Continue to refer to Figure 2 In step 203, the cause-of-death chain compilation rules, cause-of-death chain compilation cases, and each cause-of-death chain and its probability are determined according to the ICD coding list.

[0125] Specifically, the cause-of-death chain compilation rule base can be retrieved based on the category of the ICD code in the ICD code list to determine the cause-of-death chain compilation rules related to the current death event. The cause-of-death chain compilation rule base can be formulated by the competent department for cause-of-death coding based on international ICD coding rules and domestic realities. For example, it may include, but is not limited to, a list of ICD codes that cannot be used as the underlying cause of death.

[0126] Then, based on the ICD codes in the ICD code list, the cause-of-death chain compilation case library is retrieved to determine the cause-of-death chain compilation cases related to the current death event. The cause-of-death chain compilation case library can be established and reviewed by experts organized by the competent department for cause-of-death coding.

[0127] Then, the ICD code list is input into the N-Gram probability model to determine the various cause-of-death chains and their probabilities related to the current death event.

[0128] The N-Gram probabilistic model is a widely used statistical language model in the field of natural language processing. Its core principle is to segment text into N consecutive characters or words, forming N-Gram units. The N-Gram probabilistic model uses these N-Gram units to predict the next word or character in the text. Based on statistical information from large amounts of text data, it infers the probability of the next element appearing given a preceding text sequence by calculating the frequency of different N-Gram units. For example, in a large news corpus, "Beijing Tiananmen" appears frequently. When the model encounters "Beijing," it can predict that the next word with a high probability of occurrence is "Tiananmen" based on the previously calculated N-Gram frequency information.

[0129] The cause-of-death chain consists of disease diagnosis sequences, such as the following example:

[0130] (a)I46.1 Sudden Cardiac Death

[0131] (b) I50.9 Heart failure

[0132] (c)I24.9 Acute Coronary Syndrome

[0133] (d) I25.1 Coronary atherosclerotic heart disease

[0134] The probability of a chain of causes of death can be calculated using the chain rule, as follows:

[0135] Let C be a sample of a cause-of-death chain consisting of a maximum of 4 diseases:

[0136] C = c1, c2, c3, c4;

[0137] The probability P(C) can be calculated using the following formula:

[0138] P(C)=P(c1)P(c2|c1)P(c3|c1c2)P(c4|c1c2c3);

[0139] The meanings of each parameter are as follows:

[0140] P(c1) is the probability that the first disease diagnosis or injury / poisoning event in the cause-of-death chain is C1;

[0141] P(c2|c1) is the conditional probability that the second disease diagnosis or injury / poisoning event is C2 when the first disease diagnosis or injury / poisoning event in the cause-of-death chain is C1.

[0142] P(c3|c1c2) is the conditional probability that the third disease diagnosis or injury / poisoning event is C3 when the first disease diagnosis or injury / poisoning event in the cause-of-death chain is C1 and the second disease diagnosis or injury / poisoning event is C2.

[0143] P(c4|c1c2c3) is the conditional probability that the fourth disease diagnosis or injury / poisoning event is C4 when the first disease diagnosis or injury / poisoning event in the cause-of-death chain is C1, the second disease diagnosis or injury / poisoning event is C2, and the third disease diagnosis or injury / poisoning event is C3.

[0144] These probabilities can be estimated using the frequencies in the training sample library of the cause-of-death chain, as follows:

[0145] 1. Count the frequency of N-Gram: Count the occurrence of 2-4 cause-of-death codes (i.e. N-Gram) that constitute the cause-of-death chain in the training cause-of-death chain sample library, and generate a lookup table of all possible N-Grams and their corresponding frequencies.

[0146] 2. Based on maximum likelihood estimation (MLE), N-Gram probability is expressed using conditional probability, for example: P(c4|c1c2c3), which represents the conditional probability as: the number of samples with the cause-of-death chain (c1, c2, c3, c4) divided by the number of samples with the first three digits of the cause-of-death chain being (c1, c2, c3).

[0147] In step 204, the cause of death and the underlying cause of death are determined by using the Prompt language model in combination with the cause of death chain compilation rules, the cause of death chain compilation cases, and the cause of death chain and its probability.

[0148] Prompt is a technology based on artificial intelligence (AI) instructions that provide explicit and specific guidance to a language model to achieve the task or text type the user wants the model to generate. Prompt encompasses three main elements: task, instruction, and role, to ensure that the generated text meets the user's needs.

[0149] In some embodiments, the accuracy of the cause-of-death chain and underlying cause of death for the current death event can also be determined manually (e.g., by a team of experts); if inaccurate, the cause-of-death chain and underlying cause of death are corrected; and the corrected cause-of-death chain and underlying cause of death are added to the cause-of-death chain compilation case library. This provides more cases for the cause-of-death chain compilation case library and more samples for the reasoning and decision-making of the Prompt large language model.

[0150] The cause-of-death chain-assisted inference method provided in this invention integrates information from different sources and uses a combination of large models and probabilistic models to infer the cause-of-death chain and the underlying cause of death in a death event, thereby improving the accuracy and efficiency of cause-of-death inference.

[0151] The following example further illustrates the process of automatically generating a cause-of-death chain based on a certain death investigation record text using the cause-of-death chain-assisted inference method of the present invention.

[0152] Assume the death investigation record is as follows:

[0153] The patient had a prior diagnosis of hypertension (specific diagnosis time and institution unknown) and was diagnosed with coronary heart disease more than 20 years ago (specific diagnosis time and institution unknown). On January 1, 2025, the patient was transported by ambulance to the emergency department of *** Hospital due to "altered consciousness and speech impairment for 8 hours". Clinical and physical examinations diagnosed the patient with pulmonary infection, large-area cerebral infarction, and heart failure. Despite active treatment, the patient's condition did not improve. On January 8, 2025, the patient's condition further deteriorated, resulting in ventricular fibrillation and sudden cardiac death. The family refused all resuscitation measures, and the patient was pronounced clinically dead at 17:00 on January 8, 2025.

[0154] Step 1: Extract medical terms from the death investigation record text and perform semantic retrieval based on the ICD-coded knowledge base to obtain ICD-coded retrieval results.

[0155] First, medical terms were extracted from the death investigation record text. The extracted medical terms are as follows:

[0156] Hypertension, coronary heart disease, altered consciousness with speech impairment, lung infection, large-area cerebral infarction, heart failure, ventricular fibrillation, and sudden cardiac death.

[0157] Secondly, semantic retrieval was performed, and the main retrieval results are as follows:

[0158] I10 hypertension

[0159] I11 Hypertensive Heart Disease

[0160] I12 Hypertensive Nephropathy

[0161] I13 Hypertensive Heart and Kidney Disease

[0162] I15 secondary hypertension

[0163] I25.1 Coronary atherosclerotic heart disease

[0164] R40.2 Unspecified coma

[0165] J98.4 Lung Infection

[0166] I63.9 Large-area cerebral infarction

[0167] I50.9 Heart failure

[0168] I49.0 ventricular fibrillation

[0169] I46.1 Sudden Cardiac Death

[0170] Step 2: Generate a timeline-based list of ICD codes associated with the current death event based on the search results.

[0171] For example, using the prompt word engineering-driven large model combined with death investigation record text and ICD code retrieval results, the following is a list of ICD codes output chronologically:

[0172] I10 hypertension (Timeline starting point: medical history more than 20 years ago)

[0173] I25.1 Coronary artery disease (Timeline starting point: medical history more than 20 years ago)

[0174] J98.4 Pulmonary infection (Emergency visit date: January 1, 2025)

[0175] I63.9 Large-area cerebral infarction (acute onset time: 2025-01-01)

[0176] I50.9 Heart failure (Acute exacerbation time: 2025-01-01)

[0177] I49.0 ventricular fibrillation (terminal event on 2025-01-08)

[0178] I46.1 Sudden cardiac death (clinical time of death: January 8, 2025)

[0179] Step 3: Search the cause-of-death chain compilation rule knowledge base based on the generated ICD code list and extract the cause-of-death chain compilation rules related to the current death event.

[0180] For example, the search results are as follows:

[0181] 1) Sudden cardiac death (I46.1) is usually considered a direct cause of death, and it is necessary to find the underlying cause of sudden cardiac death.

[0182] 2) Both coronary atherosclerotic heart disease (I25.1) and hypertension (I10) may be primary causes, and the disease with a greater impact on mortality should be selected as the primary cause of death.

[0183] Step 4: Search the cause-of-death chain compilation case knowledge base based on the generated ICD code list and extract cause-of-death chain compilation cases related to the current death event.

[0184] For example, the search results are shown below:

[0185] Case 1: The patient had a prior diagnosis of hypertension, diabetes, and Alzheimer's disease (specific diagnosis time and institution unknown); more than 10 years ago, the patient was diagnosed with coronary heart disease (specific diagnosis time and institution unknown); on December 31, 2024, the patient was sent to the emergency department of XXX Hospital by ambulance due to "fever and shortness of breath for 3 days". After clinical and physical examinations, the patient was diagnosed with heart failure, acute coronary syndrome, lung infection, and respiratory failure. Despite active treatment, the patient's condition did not improve. On January 1, 2025, the patient's condition further deteriorated. Despite active resuscitation efforts, the patient was pronounced clinically dead at 14:13 on January 1, 2025.

[0186] The chain of causes of death in this case is as follows:

[0187] I: (a) I46.1 Sudden Cardiac Death

[0188] (b) I50.9 Heart failure

[0189] (c)I24.9 Acute Coronary Syndrome

[0190] (d) I25.1 Coronary atherosclerotic heart disease

[0191] II: (1) J98.4 Lung infection

[0192] (2) J96.9 respiratory failure

[0193] (3) I10 hypertension

[0194] Underlying cause of death: I25.1 Coronary atherosclerotic heart disease.

[0195] Step 5: Calculate the probability of the order of diseases involved in the current death event in the cause-of-death chain using the N-Gram probability model, that is, calculate the highest probability cause-of-death chains under each permutation and combination of codes in the ICD code list obtained in Step 2.

[0196] For example, the two chains of causes of death with the highest calculated probabilities are:

[0197] Chain of death 1:

[0198] (a)I46.1 Sudden Cardiac Death

[0199] (b) I49.0 ventricular fibrillation

[0200] (c) I50.9 Heart failure

[0201] (d) I25.1 Coronary atherosclerotic heart disease

[0202] Chain of death 2:

[0203] (a)I46.1 Sudden Cardiac Death

[0204] (b) I50.9 Heart failure

[0205] (c)I63.9 Large-area cerebral infarction

[0206] (d) I10 hypertension

[0207] Step 6: Using the Prompt prompting language model, combined with the rules for constructing cause-of-death chains related to the current death event, the constructed cases, and the cause-of-death chains and their probabilities determined based on the N-Gram model, determine the cause-of-death chain and the underlying cause of death for the current death event.

[0208] Specifically, the role and inference rules of the Prompt prompting language model can be set; according to the set inference rules, the cause-of-death chain compilation rules, cause-of-death chain compilation cases, and cause-of-death chains and their probabilities are input into the Prompt prompting language model, and the cause-of-death chain and root cause of death of the current death event are obtained based on the inference of the Prompt prompting language model.

[0209] For example, the character settings are as follows:

[0210] You are a WHO-certified ICD-10 coding expert, responsible for constructing logical cause-of-death chains based on clinical data that conform to international statistical standards for causes of death. You must strictly adhere to the WHO's "Rules for Medical Certification and Coding of Causes of Death," while also considering local statistical needs.

[0211] The inference rules may include, but are not limited to, input information and input order, processing flow, output requirements, and constraints. For example, the following inference rules may be set:

[0212] # Input data

[0213] Initial encoding sequence: <Please insert the initial ICD-10 encoding list in chronological / causal order here>

[0214] Rule base:

[0215] (1) Core rules:

[0216] The selection of the underlying cause of death should satisfy the causal chain of "initiation → facilitation → termination".

[0217] Treatment complications (Y40-Y84) must be ruled out as the underlying cause.

[0218] Trauma should be ordered as follows: "external cause → injury → complications".

[0219] (2) Local Supplementary Rules: <Please insert specific supplementary clauses here>

[0220] For reference, here are some high-probability cause-of-death chains: <Please insert high-probability cause-of-death chains here>

[0221] Reference Case:

[0222] <Please insert a description of a typical success story and its code chain here>

[0223] #Processing flow

[0224] 1. Logic verification phase:

[0225] Check if the initial sequence conforms to the principle that "causal inversion does not exceed order 1".

[0226] Verify whether there are any intermediate steps that should be excluded (such as treatment complications).

[0227] 2. Case Study Reference Phase:

[0228] Refer to previous typical cases as supplementary knowledge.

[0229] 3. Rule-based decision-making:

[0230] When statistical recommendations conflict with coding rules, the rules take precedence.

[0231] When multiple valid solutions exist, select from past cases and conditional probabilities.

[0232] #Output Requirements

[0233] Final chain of causes of death:

[0234] I:(a) Output line a here.

[0235] (b) Output line b here.

[0236] (c) Output line c here.

[0237] (d) Output line d here.

[0238] II:(1) (2) (3)

[0241] Underlying cause of death: Output the underlying cause of death here.

[0242] Decision Log:

[0243] #Constraints

[0244] Clinical entity diagnoses cannot be modified; only the coding order can be adjusted.

[0245] Based on the aforementioned reasoning rules, the final chain of causes of death and the underlying cause of death are as follows:

[0246] I: (a) I46.1 Sudden Cardiac Death

[0247] (b) I49.0 ventricular fibrillation

[0248] (c) I50.9 Heart failure

[0249] (d) I25.1 Coronary atherosclerotic heart disease

[0250] II: (1) I10 hypertension

[0251] (2) J98.4 Lung infection

[0252] (3) I63.9 large-area cerebral infarction

[0253] Underlying cause of death: I25.1 Coronary atherosclerotic heart disease.

[0254] Accordingly, embodiments of the present invention also provide a device for assisting in the deduction of cause-of-death chains, such as... Figure 4 The diagram shown is a structural schematic of the device.

[0255] The cause-of-death chain-aided deduction device 400 includes the following modules:

[0256] Receiver module 401 is used to acquire the text of the death investigation record;

[0257] ICD code list generation module 402 is used to generate a timeline-based ICD code list associated with the current death event based on the death investigation record text.

[0258] The information determination module 403 is used to determine the cause-of-death chain compilation rules, cause-of-death chain compilation cases, and each cause-of-death chain and its probability related to the current death event based on the ICD coding list.

[0259] The comprehensive inference module 404 is used to determine the cause of death and the underlying cause of death of the current death event by using the Prompt prompting language model 50 in combination with the cause of death chain compilation rules, the cause of death chain compilation cases, and the cause of death chain and its probability.

[0260] It should be noted that, in specific implementation, the Prompt prompt language model 50 can be pre-set in the cause-of-death chain auxiliary inference device 400, or a corresponding interface can be set in the cause-of-death chain auxiliary inference device 400, and the comprehensive inference module 404 can use the interface to call the external Prompt prompt language model 50 to realize the inference of the cause-of-death chain and the root cause of death of the current death event.

[0261] A specific structure of one embodiment of the ICD encoding list generation module 402 described above may include the following units:

[0262] The terminology extraction unit is used to extract medical terms from the death investigation record text;

[0263] The retrieval unit is used to perform semantic retrieval on the ICD-coded knowledge base based on the medical terms, and obtain ICD-coded retrieval results.

[0264] The list generation unit is used to generate a timeline-based list of ICD codes associated with the current death event based on the ICD code retrieval results.

[0265] In another embodiment of the ICD code list generation module 402, in addition to the units described above, a further filtering unit may be included: a filtering unit, used to determine the optimal ICD code by combining the death investigation record text and the ICD code retrieval results using a prompt word engineering-driven large model. Accordingly, in this embodiment, the list generation unit can generate a timeline-based ICD code list associated with the current death event based on the optimal ICD code.

[0266] By using the above filtering units to determine the optimal ICD code, the final list of ICD codes associated with the current death event can be more accurate.

[0267] A specific structure of the aforementioned information determination module 403 may include the following units:

[0268] The rule determination unit is used to retrieve the cause-of-death chain compilation rule base according to the category of the ICD code in the ICD code list, and determine the cause-of-death chain compilation rule related to the current death event;

[0269] The case determination unit is used to retrieve the cause-of-death chain compilation case library based on the ICD codes in the ICD code list and determine the cause-of-death chain compilation case related to the current death event.

[0270] The probability determination unit is used to input the ICD encoding list into the N-Gram probability model to determine each cause-of-death chain and its probability related to the current death event.

[0271] It should be noted that, in specific implementation, the N-Gram probability model can be pre-set in the cause-of-death chain auxiliary inference device 400, or the probability determination unit can call an external N-Gram probability model through a corresponding interface to determine each cause-of-death chain and its probability related to the current death event. This embodiment of the invention does not limit this.

[0272] A specific structure of the aforementioned comprehensive inference module 404 may include: a setting unit and an inference unit. Wherein:

[0273] The setting unit is used to set the role and inference rules of the Prompt large language model; the inference rules may include, for example, input information and input order, processing flow, output requirements, and constraints.

[0274] The reasoning unit is used to input the cause-of-death chain compilation rules, the cause-of-death chain compilation cases, and the cause-of-death chain and its probability into the Prompt prompting language model according to the set reasoning rules, and to obtain the cause-of-death chain and the root cause of death of the current death event based on the reasoning of the Prompt prompting language model.

[0275] Further details regarding the modules and units in the cause-of-death chain auxiliary inference device 400 can be found in the descriptions in the preceding embodiments of the present invention, and will not be repeated here.

[0276] The death chain-assisted inference method and apparatus provided in this invention fully utilize the deep language understanding capabilities of a pre-trained large language model. Based on information recorded in medical records or death investigation records, it analyzes the patient's underlying diseases, secondary symptoms, and the order in which the symptoms appeared, providing a clear framework for subsequent analysis. By enhancing the retrieval and generation of the World Health Organization's professional knowledge base on disease and death coding rules and guidelines, it extracts death chain coding rules related to the current death event and uses an N-Gram probability model to calculate the probability of the order of diseases involved in the current death event within the death chain, fully utilizing the prior probabilities provided by previous data. This invention, through the comprehensive use of large language models, thought chains, and enhanced retrieval generation technologies, intelligently extracts relevant disease ICD codes from a series of descriptions of diseases, pathological conditions, or injuries that lead to or contribute to death, as well as accidents or violence that cause such injuries. It then infers the logical relationships and causal order between the codes, providing possible death chains and the underlying cause of death.

[0277] The present invention can be used to assist medical and health institutions and disease prevention and control departments in filling in and inferring cause-of-death chains, thereby improving the quality of cause-of-death surveillance data.

[0278] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that the present invention is not limited to the described order of actions, because according to the present invention, some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to the present invention.

[0279] The present invention also provides a storage medium, which is a computer-readable storage medium storing a computer program thereon, the computer program being executable when it runs. Figure 1 The method shown may include some or all of the steps. The storage medium may include read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk, etc. The storage medium may also include non-volatile memory or non-transitory memory, etc.

[0280] The above embodiments can be implemented, in whole or in part, by software, hardware, firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data provider to another website, computer, server, or data provider via wired or wireless means.

[0281] The embodiments of the present invention have been described in detail above. Specific implementation methods have been used to illustrate the present invention. The descriptions of the embodiments above are merely for the purpose of helping to understand the method and apparatus of the present invention, and are only a part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention, and the content of this specification should not be construed as a limitation of the present invention. Therefore, any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for assisting in the deduction of cause-of-death chains, characterized in that, The method includes: Obtain the text of the death investigation record; Generate a timeline-based list of ICD codes associated with the current death event based on the death investigation record text; The rules for constructing cause-of-death chains, cases for constructing cause-of-death chains, and each cause-of-death chain and its probability are determined based on the ICD coding list. This determination includes: Based on the category of the ICD code in the ICD code list, the cause-of-death chain compilation rule base is retrieved to determine the cause-of-death chain compilation rule related to the current death event. Based on the ICD codes in the ICD code list, retrieve the cause-of-death chain compilation case library to determine the cause-of-death chain compilation case related to the current death event; Input the ICD coding list into the N-Gram probability model to determine each cause-of-death chain and its probability associated with the current death event; By using the Prompt language model in conjunction with the cause-of-death chain compilation rules, the cause-of-death chain compilation cases, and the cause-of-death chain and its probability, the cause-of-death chain and underlying cause of death of the current death event are determined.

2. The method for assisting in the deduction of cause-of-death chains according to claim 1, characterized in that, The process of generating a timeline-based list of ICD codes associated with the current death event based on the death investigation record text includes: Extract medical terms from the death investigation record text; Based on the aforementioned medical terminology, a semantic search is performed on the ICD-coded knowledge base to obtain ICD-coded search results. Based on the ICD code retrieval results, a timeline-based list of ICD codes associated with the current death event is generated.

3. The method for assisting in the deduction of cause-of-death chains according to claim 2, characterized in that, The process of generating a timeline-based list of ICD codes associated with the current death event based on the death investigation record text also includes: By combining the death investigation record text with the ICD encoding retrieval results using the prompt word engineering-driven large model, the optimal ICD encoding is determined; The process of generating a timeline-based list of ICD codes associated with the current death event based on the ICD code retrieval results includes: Generate a timeline-based list of ICD codes associated with the current death event based on the optimal ICD codes.

4. The method for assisting in the deduction of cause-of-death chains according to any one of claims 1 to 3, characterized in that, The process of using the Prompt language model in conjunction with the cause-of-death chain compilation rules, the cause-of-death chain compilation cases, and the cause-of-death chain and its probability to determine the cause-of-death chain and underlying cause of death for the current death event includes: Configure the role and inference rules of the Prompt prompt large language model; According to the set reasoning rules, the cause-of-death chain compilation rules, the cause-of-death chain compilation cases, and the cause-of-death chain and its probability are input into the Prompt prompting language model. Based on the reasoning of the Prompt prompting language model, the cause-of-death chain and the root cause of death of the current death event are obtained.

5. The method for assisting in the deduction of the cause of death chain according to claim 4, characterized in that, The reasoning rules include: input information and input order, processing flow, output requirements, and constraints.

6. The method for assisting in the deduction of cause-of-death chains according to claim 4, characterized in that, The method further includes: To determine the accuracy of the cause-of-death chain and the underlying cause of death in the current death event; If inaccurate, the chain of causes of death and the underlying cause of death shall be corrected; The corrected cause-of-death chain and underlying cause of death are added to the cause-of-death chain compilation case library.

7. An apparatus for use in the cause-of-death chain auxiliary deduction method according to any one of claims 1-6, characterized in that, The device includes: The receiving module is used to acquire the text of the death investigation record; The ICD code list generation module is used to generate a timeline-based ICD code list associated with the current death event based on the death investigation record text. The information determination module is used to determine the cause-of-death chain compilation rules, cause-of-death chain compilation cases, and each cause-of-death chain and its probability related to the current death event based on the ICD coding list. The comprehensive inference module is used to determine the cause of death and the underlying cause of death of the current death event by combining the Prompt prompting large language model with the cause of death chain compilation rules, the cause of death chain compilation cases, and the cause of death chain and its probability.

8. The cause-of-death chain auxiliary deduction device according to claim 7, characterized in that, The ICD encoding list generation module includes: A terminology extraction unit is used to extract medical terms from the death investigation record text; The retrieval unit is used to perform semantic retrieval on the ICD-coded knowledge base based on the medical terms, and obtain ICD-coded retrieval results. The list generation unit is used to generate a timeline-based list of ICD codes associated with the current death event based on the ICD code retrieval results.

9. The cause-of-death chain auxiliary deduction device according to claim 7 or 8, characterized in that, The comprehensive inference module includes: The setting unit is used to set the role and inference rules of the Prompt prompt large language model; The reasoning unit is used to input the cause-of-death chain compilation rules, the cause-of-death chain compilation cases, and the cause-of-death chain and its probability into the Prompt prompting language model according to the set reasoning rules, and to obtain the cause-of-death chain and the underlying cause of death of the current death event based on the reasoning of the Prompt prompting language model.