Recognition method and recognition system of Chinese medicine named entity based on ancient Chinese medicine literature

A technology of named entity recognition and traditional Chinese medicine ancient books, which is applied in the direction of instruments, network data indexing, and other database retrieval, etc., can solve the problems of increasing the difficulty of named entity recognition of traditional Chinese medicine, failure to obtain ideal results, difficult and expensive manual labeling, etc., to achieve Save the cost of manual labeling, improve the effect, and the effect of easy operation

Active Publication Date: 2020-12-18
UNIV OF SCI & TECH BEIJING
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, ancient TCM documents are quite different from other documents in terminology and grammar, and have their own characteristics. The named entity recognition method in the prior art is applied to ancient TCM medical records, and the ideal effect cannot be obtained.
At the same time, there are many difficult grammatical phenomena in the medical records of ancient Chinese medicine, which makes manual labeling difficult and expensive, and further increases the difficulty of named entity recognition in traditional Chinese medicine.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Recognition method and recognition system of Chinese medicine named entity based on ancient Chinese medicine literature
  • Recognition method and recognition system of Chinese medicine named entity based on ancient Chinese medicine literature
  • Recognition method and recognition system of Chinese medicine named entity based on ancient Chinese medicine literature

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0054] This embodiment provides a method for recognizing named entities of traditional Chinese medicine based on ancient Chinese medicine literature, figure 1 Shown is a schematic flow chart of the TCM named entity recognition method.

[0055] The named entities described in this embodiment are aimed at medical case documents in ancient Chinese medicine documents, but the present invention is not limited to medical records, and can also be applied to other ancient Chinese medicine documents.

[0056] Such as figure 1 Shown, described Chinese medicine named entity recognition method based on ancient Chinese medicine literature, comprises the following steps:

[0057] Step S1, obtaining the medical record corpus of ancient Chinese medicine books.

[0058] Further, the acquisition of medical case corpus of ancient Chinese medical records specifically includes the following steps:

[0059] Step S11, using Optical Characters Recognition (OCR) to scan and recognize the existing p...

no. 2 example

[0130] This embodiment provides a TCM named entity recognition system based on ancient TCM literature, said system comprising: corpus acquisition module, data cleaning module, language model pre-training module, training set labeling module, sequence labeling model training module, entity recognition module; among them,

[0131] The corpus acquisition module is used to acquire the medical case corpus of ancient Chinese medical books;

[0132] The data cleaning module is used to perform data cleaning on the acquired Chinese medical record corpus to be processed;

[0133] The language model pre-training module is used to perform language model pre-training for the ancient Chinese medicine medical record corpus based on the ancient Chinese medical record corpus;

[0134] The training set labeling module is used to perform sequence labeling on the corpus based on the cleaned-up ancient medical case corpus of traditional Chinese medicine to form a training set for subsequent model...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a TCM named entity recognition method and system based on TCM ancient books to solve the problem of TCM named entity recognition. The method performs data cleaning on the basis of obtaining the corpus of ancient Chinese books, and then pre-trains the language model; by sequentially labeling the corpus, a training set for the subsequent model is formed; based on the sequence-labeled model training set, the language model is used as the code Layer, the neural network structure is used as the decoding layer to train the sequence labeling model, so as to perform TCM named entity recognition based on the sequence labeling model. The present invention combines the existing language training model, such as the language model pre-training method bert proposed by Google, which saves the cost of manual labeling based on the small sample training set, improves the recognition effect and accuracy, is easy to operate, and realizes the ancient Chinese medicine literature, In particular, the effective and comprehensive use of medical records in ancient Chinese medicine has laid a good foundation for research in the field of Chinese medicine.

Description

technical field [0001] The invention belongs to the field of information processing and TCM literature, and in particular relates to a TCM named entity recognition method and recognition system based on TCM ancient books and literature. Background technique [0002] Traditional Chinese medicine is extensive and profound. On the one hand, the dissemination of Chinese medicine is through the direct experience of the older generation of medical workers, and on the other hand, it is literature. In the literature of traditional Chinese medicine, a large number of ancient medical records of traditional Chinese medicine are preserved, which contain the experience and diagnosis and treatment methods of many famous old Chinese medicine practitioners. The ancient medical records of traditional Chinese medicine mentioned here refer to the continuous records made by ancient Chinese medicine on the symptoms, causes, prescriptions, and medications of patients when treating diseases. Amon...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/295G06F16/951
CPCG06F16/951G06F40/295
Inventor 张德政杨石兵贾麒谢永红夏超栗辉
Owner UNIV OF SCI & TECH BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products