Unlock instant, AI-driven research and patent intelligence for your innovation.

A Text Data Enhancement Method for Electronic Medical Records Based on Sentence Semantic Replacement

A technology of electronic medical records and text data, applied in unstructured text data retrieval, text database clustering/classification, electronic digital data processing, etc., can solve problems such as model overfitting, achieve data enhancement, increase quantity, increase differential effect

Active Publication Date: 2022-05-27
SUN YAT SEN UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This patent implements data enhancement by replacing words in the text, so that the difference between the generated text and the original text is small, which can easily lead to model overfitting

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Text Data Enhancement Method for Electronic Medical Records Based on Sentence Semantic Replacement
  • A Text Data Enhancement Method for Electronic Medical Records Based on Sentence Semantic Replacement
  • A Text Data Enhancement Method for Electronic Medical Records Based on Sentence Semantic Replacement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. The following examples are intended to illustrate the present invention, but not to limit the scope of the present invention.

[0025] like figure 1 As shown, a method for enhancing electronic medical record text data based on sentence semantic replacement according to a preferred embodiment of the present invention includes:

[0026] S1. Obtain the original text to be processed: In this embodiment, from a disease classification data set obtained, the disease classification data set is used to classify each sample text according to the disease type, and the sample text is selected from the disease type with a small number of sample texts. In each sample text, one sample text is selected as the original text for data enhancement.

[0027] S2. Split the original text into multiple original sentences: divide the original...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of natural language processing, and discloses a text data enhancement method for electronic medical records based on sentence semantic replacement. A sample text in a data set is used as the original text, the original text is split into multiple sentences, and the entire sentence is Replace with sentences with the same or similar semantics as the original sentence, increase the number of samples, realize data enhancement, increase the difference between the generated text and the original text, and prevent the model from overfitting.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a text data enhancement method for electronic medical records based on sentence semantic replacement. Background technique [0002] Text classification is a fundamental task in natural language processing. Machine learning and deep learning have achieved high accuracy in this task. However, the high accuracy rate of text classification often depends on the size and quality of training data, which is often difficult to meet in real tasks, especially in the task of auxiliary diagnosis of diseases based on electronic medical record texts, it is difficult for us to collect enough High quality data. Data augmentation is widely used in deep learning, and using this technique can increase the number of training data. Jason W.Wei and Kai Zou proposed an EDA method in the article "EDA: Easy Data AugmentationTechniques for Boosting Performance on Text Classification ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/36G06F40/211G06F40/30G06K9/62G16H10/60G16H50/70
CPCG06F16/35G06F16/36G06F40/30G06F40/211G16H50/70G16H10/60G06F18/241G06F18/214
Inventor 利建鑫任江涛
Owner SUN YAT SEN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More