Online traditional Chinese medicine text named entity identifying method based on deep learning

A named entity recognition and deep learning technology, applied in character and pattern recognition, biological neural network models, instruments, etc., can solve the problems of low accuracy, low efficiency of named entity recognition, etc., to reduce the representation dimension, reduce complexity and Workload, effect of simplified identification

Inactive Publication Date: 2017-05-17
SOUTH CHINA UNIV OF TECH
View PDF4 Cites 65 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to address the above-mentioned deficiencies in the prior art and provide a deep learning-based online TCM text named en

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Online traditional Chinese medicine text named entity identifying method based on deep learning
  • Online traditional Chinese medicine text named entity identifying method based on deep learning
  • Online traditional Chinese medicine text named entity identifying method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0028] This embodiment provides a method for named entity recognition of online TCM texts based on deep learning, the flow chart of the method is as follows figure 1 shown, including the following steps:

[0029] Step 1. Obtain online TCM text data through a web crawler, perform preprocessing operations on the online TCM text data including encoding conversion, irrelevant information removal, etc., and then use existing professional dictionaries and artificial assistance to analyze the obtained online TCM text data. The named entity is marked with the BIO label set, that is, the label set contains {B, I, O}, where B represents the beginning of the named entity word, I represents the rest of the entity word, and O represents the part of the word that does not belong to the named entity ;

[0030] Step 2, use the word2vec tool to learn on a large-scale unlabeled corpus, obtain fixed-length word vectors and form a corresponding vocabulary;

[0031] In this step, the corpus data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an online traditional Chinese medicine text named entity identifying method based on deep learning. The method includes the steps that online traditional Chinese medicine text data are obtained through a web crawler, and named entities of the obtained online traditional Chinese medicine text data are labeled with existing terminological dictionaries and human assistance; a word2vec tool is used for carrying out learning on large-scale label-free linguistic data, and word vectors with fixed length are obtained and used for forming a corresponding glossary; word segmentation is carried out on the online traditional Chinese medicine text data, words are converted into the word vectors with the fixed length by searching for the glossary, the word vectors serve as input of a convolutional neural network, and a blank character is used for filling when sentence length is insufficient; output of the convolutional neural network serves as input of a bidirectional long-short-time memory recurrent neural network, and an identification result of the online traditional Chinese medicine text data words to be identified is output. Compared with a traditional method for named entity identifying, the method reduces complexity and workload of feature extraction, simplifies the processing process and remarkably improves identification efficiency.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a method for recognizing named entities of online TCM texts based on deep learning. Background technique [0002] Named entity recognition is to identify meaningful entity references and categories such as person names, place names, organizational structure names, time and digital expressions from unstructured natural texts. Named entity recognition is an important part of many natural language processing technologies. Online Chinese medicine text named entity recognition is to identify entities with specific meanings in Internet Chinese medical texts, including diseases, symptoms, medicines, ingredients, etc. [0003] Existing related technologies can be divided into two categories, one is based on artificial rules, for example, according to the probability of occurrence of words, words that appear more than a certain probability are identified as named entities; name...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62G06N3/04
CPCG06F16/3347G06N3/04G06F18/22G06F40/295
Inventor 文贵华陈佳浩
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products