Unlock instant, AI-driven research and patent intelligence for your innovation.

Cardiovascular disease medical record structuring system based on NLP

A structured and cardiovascular technology, applied in the field of natural language processing and deep learning, can solve the problems of poor generalization and transferability and low accuracy of general semantic representation models, and achieve strong generalization and transferability, high accuracy, and improved Effects on portability and computational efficiency

Active Publication Date: 2022-05-24
SOUTH CHINA UNIV OF TECH
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In order to overcome the common problems of the existing medical medical record structured scenarios, focus on using the current limited medical annotation text data to solve the problems of poor generalization and transferability and low accuracy of general semantic representation models in specific medical fields, and use deep learning models At the same time, how to balance the problem of system decoupling, the present invention provides a structured system of cardiovascular disease medical records based on NLP

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cardiovascular disease medical record structuring system based on NLP
  • Cardiovascular disease medical record structuring system based on NLP
  • Cardiovascular disease medical record structuring system based on NLP

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0063] like figure 1 As shown, the present invention proposes an NLP-based structured system of cardiovascular disease medical records, using NLP-related technologies to convert and extract unstructured medical records to obtain structured text files, which include:

[0064] Text format conversion modules, such as figure 2 As shown, it is used to convert the medical record files related to cardiovascular diseases uploaded by users, and perform format conversion on demand. Word or PDF files are supported, and the output text file after conversion is recorded as F; the file type is judged by the file suffix name. If specified If the file is a Word file (the file suffix is ​​.docx or .doc), use the third-party tool library docx2txt in Python to parse the text in the Word file and convert it into an operable string in Python; if the specified file is PDF For files (the file suffix is ​​.pdf), users need to specify whether the PDF content is a text version or a picture version. F...

Embodiment 2

[0087] Different from Example 1, as Figure 4 As shown, the process of the data augmentation method (DAGA) based on language model text generation in the named entity recognition of the present embodiment is:

[0088] 1.1. Perform label linearization on the original manually marked NER training data, that is, mix the characters of the text with the original sequence label, and place the label corresponding to each character of the entity in front of the character. Such as "diagnosing angina pectoris", after linearization, "diagnosing B-disease and diagnosing heart I-disease and diagnosing angina I-disease and diagnosing pain". As a result, a new linearized label data is formed, which is recorded as D man .

[0089] 1.2. From the proprietary corpus W medical Focus on screening out the corpus W related to cardiovascular disease cardio , based on the existing medical entity dictionary (including entity categories such as disease and diagnosis, operation, drug, anatomical part...

Embodiment 3

[0094] Different from Embodiments 1 and 2, the LexiconEnhanced method (LexiconEnhanced) in the named entity recognition of this embodiment is as follows image 3 and Figure 5 As shown, the specific process is:

[0095] 2.1. Construct a sequence of character-vocabulary pairs, that is, for a given input Chinese sentence s c ={c 1 ,c 2 ,...,c n}, for each character c in the sentence i , use the dictionary of the word vector Med-WordVec in the medical field to match the potential vocabulary containing the character, and form a vocabulary pair with the character and the matched vocabulary, expressed as,

[0096] the s cw ={(c 1 ,ws 1 ),(c 2 ,ws 2 ),...,(c n ,ws n )}

[0097] Among them, c i Represents the i-th character in the sentence, ws i Represents the set of lexical components that contain the character. For example, in "diagnosing angina pectoris", for the character "heart", its character vocabulary pair sequence is {("heart", "heart"), ("heart", "heart disea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cardiovascular disease medical record structuring system based on NLP, and the system comprises a text format conversion module which is used for converting a cardiovascular disease related medical record file uploaded by a user according to different formats and outputting the converted medical record file as a text file; the rule extraction module is used for defining a text extraction rule, performing preliminary structuring and coarse-grained information extraction on the converted text file, and outputting a corresponding structured text; and the named entity recognition module is used for training a pre-training language model based on deep learning and in combination with a natural language processing method, performing sequence labeling on the structured text by using the pre-training language model, predicting the probability of each character entity label in the structured text, determining the starting and ending positions and category information of an entity according to the probability, and identifying the entity. And extracting medical entity information related to fine-grained cardiovascular diseases, and storing predicted entity position and category information in a final structured file.

Description

technical field [0001] The invention relates to the fields of natural language processing and deep learning, in particular to an NLP-based structured system for cardiovascular disease medical records. Background technique [0002] With the aging of our country and the acceleration of urbanization, various advanced medical devices and inspection methods have been continuously applied to clinical practice, which has reduced the mortality rate of cardiovascular disease patients in hospitals in my country. However, patients with chronic diseases such as cardiovascular disease (CVD) The disease rate is still on the rise. At the same time, with the rapid development of Internet technology and the wide application of information technology in the medical field, medical data has grown exponentially, and a large number of cardiovascular disease case data have also accumulated. Medical text occupies an important position in medical data, including mostly unstructured text information ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16H10/60G06F16/36G06F40/284G06F40/295
CPCG16H10/60G06F16/367G06F40/284G06F40/295Y02A90/10
Inventor 肖睿欣孙庆华王聪
Owner SOUTH CHINA UNIV OF TECH