Automatic labeling method for professional vocabularies of medical documents

A technology of professional vocabulary and documents, which is applied in the field of automatic labeling of professional vocabulary in medical documents, can solve problems such as poor model performance, achieve the effect of increasing the amount of data, overcoming too little labeling data, and improving the accuracy of labeling

Active Publication Date: 2019-07-26
TIANJIN UNIVERSITY OF SCIENCE AND TECHNOLOGY
View PDF6 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to overcome the deficiencies of the prior art, and propose a method for automatic labeling of professional vocabulary in medical documents, which uses a semi-supervised learning algorithm t...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic labeling method for professional vocabularies of medical documents
  • Automatic labeling method for professional vocabularies of medical documents
  • Automatic labeling method for professional vocabularies of medical documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings.

[0031] The design idea of ​​the present invention is to use machine learning algorithms and technologies, and to mark medical documents or cases on professional vocabulary based on the semi-supervised learning mark method. The present invention constructs a three-layer layered neural network to mark the text: (1) the words in the text use three ways to perform vectorized feature extraction, BiLSTM extracts letter-based features, and Word2Vec performs word embedding for words , and feature extraction based on grammatical structure. (2) BiLSTM extracts and encodes contextual information surrounding words in the same sentence. (3) The CRF labeling layer jointly uses the CRF objective function to model words and label labels, and make a final label judgment.

[0032] Based on the above-mentioned design idea, the medical document professional voca...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic labeling method for professional vocabularies of medical documents. which comprises the following steps: carrying out data preprocessing on an input medical document to obtain a preprocessed medical document text; obtaining and fusing the letter-level feature vectors, the word-level feature vectors and the language feature vectors of the words to serve as encoding vectors of the words; word annotations of the medical document text after word segmentation are classified to obtain an annotated data set; outputting a multi-dimensional vector for each word asa spatial representation of the word; obtaining an enhanced annotation data set; and performing training modeling, and finally outputting a marking result. method in design. A semi-supervised learning algorithm is adopted to label a large amount of unlabeled data so that the defect that the existing medical industry is too few in labeled data is successfully overcome. The amount of data capable of being used by the model is effectively increased. The labeling accuracy of the algorithm on keywords and professional vocabularies is greatly improved and the method can be widely applied to medicalliterature processing.

Description

technical field [0001] The invention belongs to the technical field of machine learning, in particular to an automatic tagging method for professional vocabulary in medical documents. Background technique [0002] As the medical research community grows, more and more papers are published every year. There is a growing need to find ways to improve papers and automatically understand the key ideas in those papers. However, relatively little scientific information has been extracted due to the wide variety of domains and extremely limited annotation resources. [0003] At the same time, as people's demand for medical resources, corresponding medical documents, and the number of cases surge, researchers and medical staff need to quickly sort out patients' past medical data. It is often some professional vocabulary or keywords that can quickly help medical staff to make judgments from patient cases. It takes a lot of time to manually sort out these vocabulary and keywords. Due...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F16/36
CPCG06F16/35G06F16/36
Inventor 王嫄高铭王栋赵婷婷赵青陈亚瑞史艳翠孔娜王洁
Owner TIANJIN UNIVERSITY OF SCIENCE AND TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products