Unlock instant, AI-driven research and patent intelligence for your innovation.

A Multi-label International Classification of Diseases Training Method Based on Curriculum Learning

A training method and disease classification technology, applied in the field of multi-label international disease classification training based on curriculum learning, can solve the problems of unsatisfactory model generalization ability and accuracy rate, and achieve the effect of improving model accuracy and generalization ability

Active Publication Date: 2022-03-29
CHENGDU UNIV OF INFORMATION TECH +1
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

From the experimental results, it can be observed that the Fscore on the test set is generally low, while the Fscore on the training set is high. The generalization ability and accuracy of the three models are not ideal.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Multi-label International Classification of Diseases Training Method Based on Curriculum Learning
  • A Multi-label International Classification of Diseases Training Method Based on Curriculum Learning
  • A Multi-label International Classification of Diseases Training Method Based on Curriculum Learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] The Medical Information Mart for Intensive Care (MIMIC) is a medical open source dataset based on the monitoring of patients in the intensive care unit. The purpose of its publication is to promote medical research and improve the level of ICU decision support. In this example, the Discharge summary in the MIMIC text record event table (NOTEEVENTS) is used as an electronic case, and its corresponding ICD-9 code is predicted.

[0047] In this embodiment, data cleaning is performed on the original electronic records. After removing punctuation marks, numbers, stop words, and some meaningless fields like "Admission Date" in the cases, the entire dataset was segmented and a word segmentation dictionary was generated. Then calculate the TF-IDF score for each word segment in the dictionary, and TF-IDF can evaluate the importance of a word segment to a corpus. The 10,000 word segments whose TF-IDF score is within the preset threshold range will be retained, while the word se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-label international disease classification training method based on curriculum learning, which controls label distribution through three different small-batch sample sampling methods when classifying and automatically encoding a large-scale international disease data set. Firstly, the multi-label international disease training sample set is obtained, and the multi-label international disease training sample set is divided into multiple training sample subsets; in the first stage of training, the training sample subset is iteratively layered and sampled and the gradient is calculated. Update the model parameters; in the second stage of training, iteratively scramble the training sample set and calculate the gradient and update the model parameters in the second round; in the third stage of training, iterative probability sampling and calculation of the gradient are performed on the training sample subset and in the third stage round to update the model parameters. The invention improves the training stage of the current mainstream model, and the improved model greatly improves the model accuracy and generalization ability in the ICD coding multi-label classification task.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a multi-label international classification of diseases training method based on course learning. Background technique [0002] The International Classification of Diseases (ICD) code is an important label in electronic health records. The ICD code is based on the etiology, pathology, clinical manifestations and other characteristics of the disease, and the same type of disease is classified into an ordered code combination. These codes are used to quantify important statistical data, facilitate the search for patient cohorts with similar diagnoses, and are of great value and significance as a means of standardized information exchange between hospitals. [0003] It is a very meaningful work to automatically classify ICD codes for electronic medical records. On the one hand, automatic classification saves a lot of manual classification costs, and on the other...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289G06K9/62G16H10/60
CPCG06F16/35G06F40/289G16H10/60G06F18/214
Inventor 王亚强韩旭郝学超舒红平朱涛
Owner CHENGDU UNIV OF INFORMATION TECH