Semi-supervised biomedical text semantic disambiguation method

A biomedical and semi-supervised technology, applied in the field of natural language processing semantic disambiguation, can solve problems such as weak globality, high cost, and difficulty in manual annotation, and achieve the effect of solving weak globality and improving accuracy.

Inactive Publication Date: 2018-09-04
SICHUAN UNIV
View PDF9 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

To a certain extent, it solves the problems of traditional disambiguation methods, such as the lack of globalization, difficulty and high cost of manual labeling, and improves the accuracy of semantic disambiguation of biomedical texts and general texts.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-supervised biomedical text semantic disambiguation method
  • Semi-supervised biomedical text semantic disambiguation method
  • Semi-supervised biomedical text semantic disambiguation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] (1) The user inputs biomedical text and generates sentence vector features

[0040] First, divide the biomedical text into phrases, then use the Word2Vec model to generate word vectors for the phrases, and then input the word vectors in each sentence into the two-way long-term and short-term memory model, and the model will output two sentence vectors, respectively with , form a new sentence vector by cascading :

[0041]

[0042] The new sentence vector is then input into the multi-layer perceptron get the final sentence vector :

[0043]

[0044]

[0045] .

[0046] (2) Use the label transfer method to automatically label unlabeled data and disambiguate ambiguous words

[0047] Use the sentence vector feature obtained in (1) as a vector graph node, calculate the similarity of each node, automatically propagate the most similar label for unlabeled data according to the label transfer method, and for ambiguous words, also transfer the most suitable...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a semantic disambiguation method for biomedical text polysemes. The method mainly comprises the steps of performing word vectorized representation on a biomedical text by utilizing Word2Vec; based on a bidirectional LSTM model, constructing vectorized representations of context sentences for a word vector language model; propagating existing labels for labeling medical datato most similar unlabeled data according to probabilities in combination with a label propagation method by utilizing a sentence vector space similarity relationship; and finally performing semantic disambiguation on the biomedical text in combination with all the labeled data. The biomedical data has the characteristics of strong speciality, numerous terms and the like, so that the operation of manually processing the medical data is time and labor-consuming and high in error rate; by using the method, the manual labeling cost can be greatly reduced; and compared with a conventional machine learning method, the semantic disambiguation accuracy can be effectively improved.

Description

technical field [0001] The invention belongs to the field of natural language processing semantic disambiguation, and is a method and system based on semi-supervised biomedical text semantic disambiguation. Specifically, it refers to the semantic disambiguation of polysemous words in medical texts by using the bidirectional long short-term memory model Bi-LSTM based on the label transfer method. Background technique [0002] In recent years, with the explosive growth of digital information, it has become easier for medical staff to obtain medical electronic data. In the field of biomedicine, text data contains a large amount of knowledge and information in professional fields, and how to extract useful information from digital text information is becoming more and more important. Compared with general text data, the difficulty of medical text data lies in its strong professionalism and difficult data labeling. Therefore, understanding the semantic information of biomedical...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06N3/04G06K9/62
CPCG06F40/30G06N3/044G06N3/045G06F18/2155
Inventor 李智罗曜儒李健
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products