Chinese medicine named entity and part-of-speech combined learning method fusing multi-source knowledge

A technology of medical naming and learning methods, applied in the fields of instruments, electrical digital data processing, calculation, etc., to achieve the effect of increasing the sample size and improving the recognition effect

Pending Publication Date: 2021-11-30
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is: aiming at the complex entity phenomenon in Chinese medicine, how to use external data resources to correctly identify the technical problem of Chinese medical long entity and nested entity recognition, and creatively propose a Chinese medical named entity and entity that integrates multi-source knowledge part-of-speech joint learning method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese medicine named entity and part-of-speech combined learning method fusing multi-source knowledge
  • Chinese medicine named entity and part-of-speech combined learning method fusing multi-source knowledge
  • Chinese medicine named entity and part-of-speech combined learning method fusing multi-source knowledge

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The method of the present invention will be described in further detail below in conjunction with the accompanying drawings.

[0030] A Chinese medical named entity and part-of-speech joint learning method that integrates multi-source knowledge, including the following steps:

[0031] Step 1: Do data preprocessing on the Chinese medical NER dataset and the Chinese medical POS dataset.

[0032] Specifically, the following steps are included:

[0033] The Chinese medical POS data set and the Chinese medical NER data set are processed into a word-level BILOU-encoded CONLL format corpus.

[0034] In particular, for Chinese medical NER datasets with nested entities, the mapping from tokens to multi-labels is the concatenation of the labels of all intersecting entities from high-priority entities to low-priority entities.

[0035] Entity priority is defined as follows:

[0036] (1) Entities with higher positions have higher priority.

[0037] (2) If the position is the sa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese medical named entity and part-of-speech combined learning method fusing multi-source knowledge, and belongs to the technical field of information extraction in natural language processing. The invention provides a multi-input multi-task learning model. Firstly, model inputs are designed for task data from different data sources respectively; and then different inputs are embedded into the same semantic space by using the same coding structure to obtain unified vector representation. Afterwards, task specific representation is extracted through the task specific layer, and a final prediction result is obtained through calculation of the embedded representation. In the training process, an alternate calculation mode is adopted. In information extraction, the model is utilized to capture the correlation between the NER task and the POS task from different data sources. According to the method, the recognition effect and robustness of the deep learning model on the Chinese medical text named entities, especially the recognition effect and robustness on long entities, are effectively improved, and Chinese medical data part-of-speech tagging can be completed with high quality.

Description

technical field [0001] The invention relates to a Chinese medical named entity and part-of-speech joint learning method that integrates multi-source knowledge, and belongs to the technical field of information extraction in natural language processing. Background technique [0002] Named entity recognition of Chinese medical texts is an important basic task in the vertical field of natural language processing, which can serve a variety of tasks such as intelligent dialogue systems and neural machine translation. For clinical applications, such as automatic electronic health records, etc., it also has very important research significance and value. [0003] In Chinese medical texts, there are a large number of complex entity phenomena. For example, "respiratory muscle paralysis", "respiratory center involvement", etc., are entities with clinical manifestations, the length of the entities is relatively long, and body entities such as "respiratory muscles" and "respiratory cen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/30
CPCG06F40/295G06F40/30
Inventor 冯冲赵培雯
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products