Unlock instant, AI-driven research and patent intelligence for your innovation.

Electronic medical record text data enhancement method based on sentence semantic replacement

A technology of electronic medical records and text data, applied in text database clustering/classification, unstructured text data retrieval, electronic digital data processing, etc., can solve problems such as model overfitting, achieve data enhancement, increase diversity, The effect of increasing the number

Active Publication Date: 2021-05-25
SUN YAT SEN UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This patent implements data enhancement by replacing words in the text, so that the difference between the generated text and the original text is small, which can easily lead to model overfitting

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Electronic medical record text data enhancement method based on sentence semantic replacement
  • Electronic medical record text data enhancement method based on sentence semantic replacement
  • Electronic medical record text data enhancement method based on sentence semantic replacement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0030] Such as figure 1 As shown, a kind of electronic medical record text data enhancement method based on sentence semantic replacement in the preferred embodiment of the present invention includes:

[0031] S1. Obtain the original text to be processed: In this embodiment, from a disease classification data set obtained, the disease classification data set is used to classify each sample text according to the type of disease, and from the disease type with a small number of sample texts Among the sample texts, a sample text is selected as the original text for data enhancement.

[0032] S2. Split the original text into multiple original sentences: divide the target tex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of natural language processing, and discloses an electronic medical record text data enhancement method based on sentence semantic replacement, which comprises the following steps: taking a sample text in a data set as an original text, splitting the original text into a plurality of sentences, replacing the whole sentence with sentences with the same or similar semantics as the original sentence, increasing the number of samples. Data enhancement is realized, the difference between the generated text and the original text is increased, and model overfitting is prevented.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to an electronic medical record text data enhancement method based on sentence semantic replacement. Background technique [0002] Text classification is a fundamental task in natural language processing. Machine learning and deep learning have achieved high accuracy in this task. However, the high accuracy of text classification often depends on the size and quality of training data, which is often difficult to meet in real tasks, especially in the aided disease diagnosis task based on electronic medical record text, it is difficult for us to collect enough high-quality data. Data augmentation is widely used in deep learning, using this technique to increase the number of training data. Jason W.Wei and Kai Zou proposed an EDA method in the article "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks". This method inclu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F16/36G06F40/211G06F40/30G06K9/62G16H10/60G16H50/70
CPCG06F16/35G06F16/36G06F40/30G06F40/211G16H50/70G16H10/60G06F18/241G06F18/214
Inventor 利建鑫任江涛
Owner SUN YAT SEN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More