Medical question-answering system construction method based on language model and entity matching

A technology of question answering system and construction method, applied in medical reference, semantic tool creation, health care informatics, etc., can solve problems such as low reliability, what food you can’t eat, and can’t be solved effectively

Active Publication Date: 2021-04-16
SICHUAN UNIV
View PDF9 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For the retrieval-style question answering system, it is only based on literal key information matching, and it cannot solve many deep-semantic problems. For example, if the question "what food should I eat when I have a cold" is only based on traditional retrieval, it is very likely that similar problems will be found. It is "what food to eat when you have a cold", so the answer will be exactly the opposite
As for the question answering system of knowledge graph type, because it needs a lot of professional structured knowledge graphs, it is better to answer factual questions, but it cannot effectively solve the problems with strong subjectivity.
At the same time, for the generative question answering system, because it relies heavily on the machine learning model, its reliability is not strong, and it is not applicable to medical scenarios that require high accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Medical question-answering system construction method based on language model and entity matching
  • Medical question-answering system construction method based on language model and entity matching

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0028] A construction method of medical question answering system based on language model and entity matching, including S1. data collection, S2. deep neural network model design, S3. training named entity recognition model and constructing knowledge map, S4. constructing complete medical retrieval question answering system.

[0029] The data collected included web medical discussion posts and web public datasets from specific websites. Online medical discussion posts include medical questions actually asked by users and answers answered by professional doctors. The collected online medical discussion posts are cleaned and stored in ElasticSearch as a search data set. The network public datasets include medical natural language processing competition datasets and open datasets of open source websites, use the open source data of medical natural language processing competition datasets to train medical-related named entity recognition models; collect public datasets from open sou...

Embodiment 2

[0031] The difference between this embodiment and Embodiment 1 is that the design of the deep neural network model is based on the BERT model, and the twin network structure is introduced to form a deep neural network model structure combining the two, which is used for text matching or The task of answer selection.

[0032] Originally using the BERT model to do text matching tasks, you only need to connect two texts with [SEP] tags, input them into the BERT model, and then encode the CLS tags to output the final probability. The deep neural network model uses the BERT model to do text matching tasks. On the basis of keeping the original CLS label features unchanged, the question and answer texts are encoded in a single text, and then the Token Mean Vector is used as the feature of the sentence, and the three These features are spliced ​​together by means of a twin network, and then input into a fully connected network for encoding. Being able to learn more similarities betwe...

Embodiment 3

[0034] The difference between this embodiment and Embodiment 1 is: training the named entity recognition model and constructing the knowledge map, and adding the entity matching step in the process of the traditional retrieval question answering system to increase the key entity's contribution to the entire retrieval process The role of entity matching uses two major aspects: named entity matching and knowledge graph entity matching. Specific steps include:

[0035] S3.1. Train the named entity recognition model, use the BERT-BiLSTM-CRF model for entity recognition, use the BERT model to obtain the features of the text, do not update the weight of the BERT model during training, only update the weight of the subsequent BiLSTM-CRF . The network public data set adopts the ccks 2019 Chinese medical entity recognition data set, which contains 6 important entity categories: disease, imaging examination, drug, chemical examination, operation and anatomical part. Training on this d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a medical question-answering system construction method based on a language model and entity matching. The method comprises the steps: S1, data collection; S2, deep neural network model design; S3, named entity recognition model training and knowledge graph construction; S4, complete medical retrieval type question-answering system construction. The method specifically comprises: collecting network medical discussion posts, cleaning the network medical discussion posts, and storing the cleaned network medical discussion posts into ElasticSearch to serve as a retrieval data set; processing open source data of the competition data set by using a medical natural language, and training a named entity recognition model related to medical treatment; and collecting a public data set of the open source website to form a medical knowledge graph so as to expand a retrieval process. According to the medical question-answering system method based on language model and entity matching, after the question-answering system is constructed and recalled, finely arranged and comprehensively scored, the most appropriate answer is output in combination with a reasonable scoring mechanism, and the defects of a retrieval type question-answering system and a knowledge graph type question-answering system are overcome.

Description

technical field [0001] The invention relates to the technical field of question answering systems, in particular to a method for constructing a medical question answering system based on language models and entity matching. Background technique [0002] Medical care is a topic that people can never live without. In the past, when people wanted to obtain some medical-related information, they could only go to the hospital and ask the doctor, but with the emergence of the Internet, people began to search for medical information on the Internet. People only need to enter some keywords, and search engines can help people quickly find web pages that contain the information they need. However, these traditional search engines have many deficiencies, such as the information returned is too much and too complicated, and a large amount of information is repeated, even professional medical personnel need to spend a lot of time to filter out these huge information. What I need, let a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F16/33G06F16/36G06F16/9535G06F40/126G06F40/216G06F40/295G06N3/04G16H70/00
Inventor 章毅郭泉张海仙曹帅张强张欣培
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products