Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Semi-supervised biomedicine event extraction method based on co-training

A biomedical and event extraction technology, applied in medical informatics, informatics, medical data mining, etc., can solve problems such as small sample size and easy overfitting, achieve accurate classification, reduce overfitting problems, The effect of enriching semantic information

Inactive Publication Date: 2018-05-01
JILIN UNIV
View PDF6 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the problem that the amount of labeled samples in the existing supervised learning for biomedical event extraction is very small and easy to cause overfitting, the present invention proposes a semi-supervised biomedical event extraction method based on co-training. The content of the invention mainly includes : Use the method of semi-supervised learning to expand the idea of ​​​​annotated sample sets; use the SVM classifier and CNN classifier to train together, select the process of sample backfilling to the training set; the process of constructing short sentence sets used as CNN input; the construction of CNN network Process; sample selection strategy for backfilling into training set in unlabeled sample set

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-supervised biomedicine event extraction method based on co-training
  • Semi-supervised biomedicine event extraction method based on co-training
  • Semi-supervised biomedicine event extraction method based on co-training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] Step 1: Initialize labeled and unlabeled datasets. After text preprocessing, the labeled data set is used as the original training set, and a short sentence training set is generated.

[0022] Combine the training sets of GE'11 and GE'13 as the original training set. Download relevant biomedical literature as unlabeled datasets from some public repositories on the Internet. Text preprocessing using NLTK and the McClosky-Charniak-Johnson biomedical syntax analysis model. Since most sentences in biomedical texts are too long, CNN cannot effectively classify them. Therefore, we replace the sentences of biomedical texts with short sentences with limited space and compact structure, but can still independently express semantics, and use CNN to classify short sentences. . The shortest dependency path between biological entities has rich semantic information, which can well capture the sequence of predicate parameters and provide important information for extracting events....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a semi-supervised biomedicine event extraction method based on co-training. Automatic extraction of biomedicine events greatly interests people with rapid increasing of biomedicine literatures. The scale of the marked biomedicine event corpus is small to influence the performances of the classification algorithm and even cause overfitting. The method provided by the invention identifies more accurate positive instances from unmarked data to enlarge a marked training set. The method comprises the steps of: designing abundant features for usage of an SVM; learning short sentences based on word embedding from Word2vec and Pubmed; further extending the short sentences to dependent short sentences between triggering words and parameters, and inputting the dependent short sentences into a CNN; and finally, performing backfill of samples, meeting conditions, predicted by the SVM and the CNN in the unmarked corpus into the training set, incrementally extendingthe training set. Lots of experiment results show that the new semi-supervised biomedicine event extraction method can effectively extract events.

Description

technical field [0001] The invention relates to the field of text mining, in particular to a method for extracting semi-supervised biomedical events based on co-training. Background technique [0002] Biomedical event extraction is an important branch of information extraction. With the rapid growth of biomedical literature, researchers need a lot of energy and time to acquire relevant scientific knowledge. Therefore, automatic extraction of biomedical event information has attracted great interest. Therefore, it is necessary to extract biomedical events in an efficient and accurate way. [0003] At present, the methods of event extraction can be roughly divided into two categories: rule-based methods and machine learning-based methods. A rule-based event extraction system consists of a series of rules, including sentence structure, grammatical relationship, and semantic relationship. These are defined manually from the training data or learned automatically. Human inte...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16H50/70G06F17/30
CPCG06F16/35
Inventor 卢奕南马小蕾路扬潘航宇
Owner JILIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products