Text word segmentation analysis method and system for medical record text data structuring

A text data and text word segmentation technology, which is applied in patient-specific data, electronic digital data processing, natural language data processing, etc., can solve the problems of low efficiency of traditional medical record data mining, unsatisfactory case entity mapping relationship, poor accuracy, etc., to achieve The effect of reducing manual recognition and manual repetitive work, accurate medical vocabulary, and improving word segmentation accuracy

Pending Publication Date: 2021-06-11
山东健康医疗大数据有限公司
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The technical task of the present invention is to address the above deficiencies and provide a structured text segmentation analysis method and system for medical record text data to solve the problems of low mining efficiency, poor accuracy and failure to satisfy the case entity mapping relationship existing in traditional medical record data. technical problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text word segmentation analysis method and system for medical record text data structuring
  • Text word segmentation analysis method and system for medical record text data structuring
  • Text word segmentation analysis method and system for medical record text data structuring

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] The text word segmentation analysis method of medical record text data structure of the present invention, comprises the following steps:

[0047] S100. Construct a medical thesaurus based on medical text data, the medical thesaurus includes medical words, weights and parts of speech, and the above parts of speech includes traditional words and medical words;

[0048] Generate all the words of the medical text data to be segmented based on the thesaurus dictionary, and construct a directed acyclic graph based on all the words above;

[0049] S200. Based on the above-mentioned medical thesaurus and directed acyclic graph, search for the maximum zero-returning path through dynamic programming to search for the maximum segmentation combination of sentence word frequency, and obtain a word set with context order and part of speech;

[0050] S300. Based on the three dimensions of the position of the word, the original part of speech of the word, and the medical part of speec...

Embodiment 2

[0073] The medical record text data structured text word segmentation analysis system of the present invention performs structured word segmentation and analysis on the medical record text data through the medical record text data structured text word segmentation analysis method disclosed in Embodiment 1, and the system includes a medical thesaurus building module , a word segmentation model building module, a word segmentation module, a triple analysis module and a standardization module, the medical thesaurus building module is used to build a medical thesaurus based on medical text data, and the medical thesaurus includes medical words, weights and parts of speech, and the parts of speech include The word traditional part of speech and the word medical part of speech; the word segmentation model building module is used to generate all the words of the medical text data to be segmented based on the thesaurus dictionary, and build a directed acyclic graph based on all the abov...

Embodiment 3

[0082] In a computer-readable medium of the present invention, computer instructions are stored on the computer-readable medium, and when executed by a processor, the computer instruction causes the processor to execute the method disclosed in Embodiment 1. Specifically, a system or device equipped with a storage medium may be provided, on which a software program code for realizing the functions of any of the above embodiments is stored, and the computer (or CPU or MPU of the system or device) ) to read and execute the program code stored in the storage medium.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text word segmentation analysis method and system for medical record text data structuring, belongs to the technical field of medical record data mining, and aims to solve the technical problem of how to solve the defects of low mining efficiency, poor accuracy and incapability of meeting a medical record entity mapping relationship in traditional medical record data. The method comprises the following steps: constructing a medical word library based on medical text data; generating all formed words of the medical text data to be subjected to word segmentation based on the lexicon dictionary, and constructing a directed acyclic graph based on all the formed words; based on the medical lexicon and the directed acyclic graph, searching a maximum segmentation combination of a statement word frequency by searching a maximum return-to-zero path through dynamic planning to obtain a word set with a preamble sequence and part-of-speech; analyzing the word set through a ternary relation model to obtain a ternary mapping relation data set; and carrying out standardization processing on the ternary mapping relation data set to obtain a binary mapping relation data set.

Description

technical field [0001] The invention relates to the technical field of medical record data mining, in particular to a text segmentation analysis method and system for structured medical record text data. Background technique [0002] Data in the medical field has its particularity, mainly including diagnosis, disease, drug, treatment and other categories. On the basis of these categories, cancer specific disease data pays more attention to disease-related complication data, pathological and immunohistochemical data, and treatment Relevant surgery, radiotherapy, chemotherapy, targeted therapy, traditional Chinese medicine treatment data, and patient-related family history, disease history data, etc., these data are mostly unstructured or semi-structured text data stored in the patient's medical record data. [0003] Data mining mainly uses algorithms to extract relevant information and convert unstructured text data into structured data that can be recognized and processed by...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/216G06F40/242G16H10/60
CPCG16H10/60G06F40/216G06F40/242G06F40/289
Inventor 钟信真左霖
Owner 山东健康医疗大数据有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products