A sparse medical entity recognition method based on an attention mechanism
By employing an attention-based sparse medical entity recognition method, which dynamically adjusts entity weights using BERT, Bi-LSTM, and CRF layers, the method addresses the imbalance and sparsity issues in named entity recognition in the medical field, thereby improving recognition accuracy and robustness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING UNIV
- Filing Date
- 2024-01-17
- Publication Date
- 2026-06-23
AI Technical Summary
Existing named entity recognition technologies in the medical field suffer from problems such as small corpus size, incomplete domain coverage, difficulty in handling contextual dependencies, and imbalance and sparsity in the number of entities, resulting in poor performance of the model in recognizing new domains or specific types of entities.
A sparse medical entity recognition method based on attention mechanism is adopted. Word vectors are extracted by BERT model, and entity category weights are dynamically adjusted by combining Bi-LSTM and Attention mechanism. The prediction results are output by CRF layer to optimize the recognition of sparse entities.
It improves the accuracy and robustness of named entity recognition, especially when facing imbalanced datasets and sparse entities, where the model performs exceptionally well and enhances recognition performance in new domains.
Smart Images

Figure CN117952107B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of recognition technology, and in particular to a sparse medical entity recognition method based on an attention mechanism. Background Technology
[0002] Against the backdrop of the rapid development of "Internet + Healthcare," various big data retrieval platforms have launched online medical functions. Users can search for relevant health topics to obtain medical knowledge. However, although most platforms can quickly provide search results based on user keywords, the returned information is often vast and disorganized, filled with a large amount of irrelevant information. This requires users to extract answers from massive amounts of information, thus affecting the user experience. Therefore, the existence of question-and-answer systems has become necessary.
[0003] Question answering systems derive answers by semantically understanding and parsing a given natural language question, and then using knowledge base queries and reasoning. In recent years, with the rapid advancement of knowledge graph technology, knowledge graph-based question answering systems have made significant progress and have been widely applied in fields such as healthcare, education, and finance. Named entity recognition (NAME) is a key task in natural language processing. When a user asks a question, the system first needs to identify the entities in the question through named entity recognition in order to more accurately understand the user's question. However, current named entity recognition technologies have the following drawbacks:
[0004] (1) Corpus and resource limitations: The performance of entity recognition is greatly affected by the corpus and resources used for training and evaluation. Existing corpora may be small in size, have incomplete domain coverage, or contain annotation errors. These problems may weaken the generalization ability of the model, resulting in poor performance in new domains or specific types of entity recognition.
[0005] (2) Context Dependency Issues: Named entities often have multiple meanings, and their exact meaning needs to be determined based on the context. For example, "Oracle" can refer to ancient Chinese characters or Oracle Corporation of the United States. Correctly identifying and eliminating ambiguity in named entities requires in-depth analysis of contextual information. Nevertheless, current technologies still face challenges in handling ambiguity and contextual relevance.
[0006] (3) Imbalanced entity quantity and sparsity issues: In medical question-and-answer texts, some entity categories appear much more frequently than others. This may cause the model to overfit to common entities during training, while ignoring less frequent entities, thus reducing the overall recognition performance. In addition, because some entities are sparsely distributed in the text, the model has difficulty learning the feature representations of these entities effectively, which in turn affects the accuracy of recognition. Summary of the Invention
[0007] This invention aims to at least solve the technical problems existing in the prior art, and in particular, it innovatively proposes a sparse medical entity recognition method based on an attention mechanism.
[0008] To achieve the above-mentioned objectives of this invention, this invention provides a sparse medical entity recognition method based on an attention mechanism, comprising the following steps:
[0009] S1, extract word vectors using the BERT model and further extract features using Bi-LSTM;
[0010] S2 uses the Attention mechanism to extract deep connections within word vectors;
[0011] S3 dynamically adjusts the entity category weights and fusion weights in the ensemble learning based on the sparsity characteristics of each batch of entities;
[0012] S4 outputs the prediction results through the CRF layer.
[0013] In a preferred embodiment of the present invention, step S1 includes:
[0014] f t =σ(w f [h t-1 ,x t ]+b f )
[0015] Among them, f t This represents the output of the forget gate at time t;
[0016] σ() represents the activation function;
[0017] w f This represents the forget gate weight matrix in an LSTM network.
[0018] h t-1 This represents the hidden layer state of the input vector at time t-1;
[0019] x t This represents the hidden layer state of the input vector at time t;
[0020] b f This indicates the forget gate bias in an LSTM network;
[0021] i t =σ(w i [h t-1 ,x t ]+b i )
[0022] Among them, i t This represents the input gate output at time t;
[0023] σ() represents the activation function;
[0024] w i h represents the input gate weight matrix in an LSTM network. t-1 This represents the hidden layer state of the input vector at time t-1;
[0025] x t This represents the hidden layer state of the input vector at time t;
[0026] b i This represents the input gate bias in an LSTM network;
[0027] o t =σ(w o [h t-1 ,x t ]+b o )
[0028] Among them, o t This indicates the output of the gate at time t;
[0029] σ() represents the activation function that maps any real number to (0,1);
[0030] w o h represents the output gate weight matrix in an LSTM network. t-1 This represents the hidden layer state of the input vector at time t-1;
[0031] x t This represents the hidden layer state of the input vector at time t;
[0032] b o This indicates the output gate bias in an LSTM network;
[0033]
[0034] in, This represents the candidate state at the current moment;
[0035] tanh() represents a nonlinear function;
[0036] w c This represents the candidate state weight matrix in an LSTM network.
[0037] h t-1 This represents the hidden layer state of the input vector at time t-1;
[0038] x t This represents the hidden layer state of the input vector at time t;
[0039] b c This represents the candidate state bias in the LSTM network;
[0040]
[0041] Among them, C t Indicates the current state of the memory cell;
[0042] f t This represents the output of the forget gate at time t;
[0043] ⊙ indicates element-wise multiplication;
[0044] C t-1 This represents the cell state at time t-1;
[0045] i t The input gate output at time t;
[0046] Indicates the current input cell state;
[0047] h t =o t ⊙tanh(C t )
[0048] Among them, h t This represents the hidden layer state of the input vector at time t;
[0049] o t This indicates the output of the gate at time t;
[0050] ⊙ indicates element-wise multiplication;
[0051] tanh() represents a nonlinear function;
[0052] C t Indicates the current state of the memory cell;
[0053] For Bi-LSTM networks, we use This represents the output of the forward LSTM. Let represent the output of the backward LSTM, and The final output of the Bi-LSTM can be obtained by splicing the components. That is:
[0054]
[0055] in, This represents the output of the forward LSTM;
[0056] LSTM(,) represents a Bi-LSTM network;
[0057] x tThis represents the hidden layer state of the input vector at time t;
[0058] This represents the output of the forward LSTM at time t-1;
[0059]
[0060] in, This represents the output of the backward LSTM;
[0061] LSTM(,) represents a Bi-LSTM network;
[0062] x t This represents the hidden layer state of the input vector at time t;
[0063] This represents the output of the LSTM after time t-1;
[0064]
[0065] This represents the final output of the Bi-LSTM;
[0066] [,] indicates concatenation operation;
[0067] This represents the output of the forward LSTM;
[0068] This represents the output of the backward LSTM.
[0069] In a preferred embodiment of the present invention, step S3 includes:
[0070] Z i =tanh(W0H i +b0)
[0071] Z i Let represent the attention score of the i-th sequence;
[0072] tanh() represents a nonlinear function;
[0073] W0 represents the weight;
[0074] H i This represents the i-th sequence output by the Bi-LSTM layer;
[0075] b0 represents the bias amount;
[0076] A is then calculated using SoftMax normalized weights. i :
[0077]
[0078] A i Indicates attention weight;
[0079] exp() represents an exponential function with base e.
[0080] Z i Let represent the attention score of the i-th sequence;
[0081] t represents the length of the sample sequence;
[0082] Z j Let represent the attention score of the j-th sequence;
[0083] Finally, attention feature A is calculated by weighted summation:
[0084]
[0085] A represents the output of the attention mechanism;
[0086] t represents the length of the sample sequence;
[0087] A i Indicates attention weight;
[0088] H i This represents the i-th sequence output by the Bi-LSTM layer.
[0089] In a preferred embodiment of the present invention, it further includes:
[0090] Loss multi =-ln(P i )
[0091] Among them, Loss multi Represents the multi-class loss function;
[0092] P i This represents the probability that the entity is predicted to be of type i.
[0093] In a preferred embodiment of the present invention, it further includes:
[0094] Loss=(1-μ)(αLoss classification +βLoss crf )
[0095] Where Loss represents the dynamically adjusted loss;
[0096] μ represents the reduction factor;
[0097] α represents the second fusion factor;
[0098] Loss classificationThis represents the final multi-class classification loss;
[0099] β represents the first fusion factor;
[0100] Loss crf This represents CRF loss.
[0101] In a preferred embodiment of the present invention, the final multi-class classification loss is calculated as follows:
[0102]
[0103] Among them, Loss classification This represents the final multi-class classification loss;
[0104] t represents the length of the sample sequence;
[0105] W i This represents the weight corresponding to the real entity category of the i-th character;
[0106] Loss i Represents the loss of the i-th character. c .
[0107] In a preferred embodiment of the present invention, the loss for a specific character is... c The calculation method is as follows:
[0108] Loss c =-W i (1-P i ) γ ln(P i )
[0109] Among them, Loss c Indicates the loss of a specific character;
[0110] W i This indicates the degree of attention paid to the i-th entity;
[0111] P i This represents the probability that the entity is predicted to be of type i.
[0112] In a preferred embodiment of the present invention, the CRF loss is calculated as follows:
[0113]
[0114] Among them, Loss crf Indicates CRF loss;
[0115] e represents the natural base;
[0116] S RealPath The score represents the true sequence;
[0117] S1, S2, ..., S k This represents the score for all possible paths;
[0118] In a preferred embodiment of the present invention, the second fusion factor is calculated as follows:
[0119] α = 1 - β
[0120] Where α represents the second fusion factor;
[0121] β represents the first fusion factor;
[0122] In a preferred embodiment of the present invention, the first fusion factor is calculated as follows:
[0123]
[0124] Where β represents the first fusion factor;
[0125] C represents all entity categories;
[0126] T i This indicates the number of entities of category i in this batch;
[0127] B represents the number of samples in the batch;
[0128] t represents the length of the sample sequence.
[0129] In summary, by adopting the above technical solution, the present invention is able to:
[0130] (1) The proposed named entity recognition model can not only dynamically focus on difficult-to-identify samples to improve learning efficiency, but also introduce a reduction factor to reduce the interference caused by sparse entities during parameter updates. This significantly improves the model's performance in dealing with the imbalanced number of entities and the problem of sparse entities in medical named entity recognition.
[0131] (2) The proposed named entity recognition model performs well even with a small corpus, incomplete domain coverage, or labeling errors. This makes the model relatively superior in new domains or specific types of entity recognition tasks.
[0132] (3) An attention mechanism is used, which enables the model to retain a variety of contextual details and cleverly integrate low-level information while focusing on sparse entity feature information, so as to improve the accuracy of entity recognition.
[0133] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description
[0134] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the description of the embodiments taken in conjunction with the following drawings, in which:
[0135] Figure 1 This is a schematic diagram of the model structure of the present invention.
[0136] Figure 2 This is a schematic diagram of the Bi-LSTM structure of the present invention. Detailed Implementation
[0137] Embodiments of the present invention are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.
[0138] This invention proposes a sparse medical entity recognition method based on an attention mechanism, providing an efficient and robust solution for named entity recognition tasks, particularly for applications in the medical field. The key to this method lies in its comprehensive utilization of the advantages of BERT, Bi-LSTM, attention mechanisms, and Conditional Random Fields (CRF), along with a dynamically optimized strategy.
[0139] ①BERT Model: As a powerful pre-trained model, the BERT model can capture complex contextual relationships between words. In this invention, BERT is used to initially extract word vectors from the input, which lays the foundation for subsequent feature extraction.
[0140] ② Further feature extraction of Bi-LSTM: Bidirectional Long Short-Term Memory Network (Bi-LSTM) can understand text from two directions, helping the model to better understand the temporal characteristics and contextual information in language.
[0141] ③ Application of attention mechanism: The attention mechanism enables the model to focus on key parts of the text and better capture the dependencies between words, which is especially important for understanding complex medical terms.
[0142] ④ Use of CRF layers: Conditional Random Field (CRF) layers are used in the final stage of the model to output prediction results. CRF can effectively consider the dependencies between labels, improving the accuracy of entity recognition.
[0143] ⑤ Optimization for sparse entities: The model continuously adjusts the weights of entity categories and fusion weights based on the characteristics of each batch of data. This method can handle different categories of entities more precisely, especially when the data distribution is uneven or there are sparse entities. At the same time, in order to reduce the interference of sparse entities in the model training process, a reduction factor is introduced, which helps to balance the imbalanced distribution in the dataset, thereby improving the overall performance of the model.
[0144] This model not only improves the accuracy of named entity recognition, especially when dealing with medical data, but also enhances its robustness in the face of imbalanced datasets and sparse entities. The model's innovation lies in its dynamic adjustment mechanism and the introduction of reduced factors, both designed to optimize for medical-specific NER tasks.
[0145] This invention proposes a sparse medical entity recognition method based on an attention mechanism, providing an efficient and robust solution for medical named entity recognition tasks, particularly for applications in the medical field, such as... Figure 1 As shown, its specific implementation is as follows:
[0146] X1, X2, ..., X t For input statement
[0147] ①BERT Model: BERT is used to initially extract word vectors from the input, transforming text into word embedding vectors {E1, E2, ..., E...}. t}
[0148] ② Further feature extraction from Bi-LSTM: Bidirectional Long Short-Term Memory (Bi-LSTM) networks can understand text from two directions, helping the model better understand the temporal characteristics and contextual information in language. The structure diagram of Bi-LSTM is shown below. Figure 2 As shown, X1, X2, ..., X t Let h1, h2, ..., h be the input word vectors. t Indicates forward propagation, h'1, h'2, ..., h' t This indicates backpropagation, with the final output vectors being H1, H2, ..., H... t The calculation is performed by combining the forward and backward transport vectors. An LSTM computation unit consists of three gate structures: an input gate, a forget gate, and an output gate. The specific calculation process is shown in the following formula:
[0149] f t =σ(w f [h t -1,x t ]+b f )
[0150] Among them, ft This represents the output of the forget gate at time t;
[0151] σ() represents the activation function that maps any real number to (0, 1);
[0152] w f This represents the forget gate weight matrix in an LSTM network.
[0153] [h t-1 x t ] represents a 1×2 matrix, which is actually
[0154] h t-1 This represents the hidden layer state of the input vector at time t-1;
[0155] x t This represents the hidden layer state of the input vector at time t;
[0156] b f This indicates the forget gate bias in an LSTM network;
[0157] i t =σ(w i [h t-1 x t ]+b i )
[0158] Among them, i t This represents the input gate output at time t;
[0159] σ() represents the activation function that maps any real number to (0, 1);
[0160] w i This represents the input gate weight matrix in an LSTM network.
[0161] [h t-1 x t ] represents a 1×2 matrix, which is actually
[0162] h t-1 This represents the hidden layer state of the input vector at time t-1;
[0163] x t This represents the hidden layer state of the input vector at time t;
[0164] b i This represents the input gate bias in an LSTM network;
[0165] o t =σ(w o [h t-1 x t ]+bo )
[0166] Among them, o t This indicates the output of the gate at time t;
[0167] σ() represents the activation function that maps any real number to (0, 1);
[0168] w o This represents the output gate weight matrix in an LSTM network.
[0169] [h t-1 x t ] represents a 1×2 matrix, which is actually h t-1 This represents the hidden layer state of the input vector at time t-1;
[0170] x t This represents the hidden layer state of the input vector at time t;
[0171] b o This indicates the output gate bias in an LSTM network;
[0172]
[0173] in, This represents the candidate state at the current moment;
[0174] tanh() represents a nonlinear function;
[0175] w c This represents the candidate state weight matrix in an LSTM network.
[0176] [h t-1 ,x t ] represents a 1×2 matrix, which is actually h t-1 This represents the hidden layer state of the input vector at time t-1;
[0177] x t This represents the hidden layer state of the input vector at time t;
[0178] b c This represents the candidate state bias in the LSTM network;
[0179]
[0180] Among them, C t Indicates the current state of the memory cell;
[0181] f t This represents the output of the forget gate at time t;
[0182] ⊙ indicates element-wise multiplication;
[0183] C t-1 This represents the cell state at time t-1;
[0184] i t The input gate output at time t;
[0185] Indicates the current input cell state;
[0186] h t =o t ⊙tanh(C t )
[0187] Among them, h t The hidden layer state represents the input vector at time t; o t This indicates the output of the gate at time t;
[0188] ⊙ indicates element-wise multiplication;
[0189] tanh() represents a nonlinear function;
[0190] C t Indicates the current state of the memory cell;
[0191] For Bi-LSTM networks, we use This represents the output of the forward LSTM. Let represent the output of the backward LSTM, and The final output of the Bi-LSTM can be obtained by splicing the components. As shown below:
[0192]
[0193] in, This represents the output of the forward LSTM;
[0194] LSTM(,) represents a Bi-LSTM network;
[0195] x t This represents the hidden layer state of the input vector at time t;
[0196] This represents the output of the forward LSTM at time t-1;
[0197]
[0198] in, This represents the output of the backward LSTM;
[0199] LSTM(,) represents a Bi-LSTM network;
[0200] x t This represents the hidden layer state of the input vector at time t;
[0201] This represents the output of the LSTM after time t-1;
[0202]
[0203] This represents the final output of the Bi-LSTM;
[0204] [,] indicates concatenation operation;
[0205] This represents the output of the forward LSTM;
[0206] This represents the output of the backward LSTM;
[0207] ③ Application of attention mechanism: The attention mechanism enables the model to focus on key parts of the text, better capturing the dependencies between words. The Bi-LSTM layer merges the results of the forward and backward outputs, outputting sequence features H. t The input is passed to the soft attention mechanism layer, where the attention score Z of the i-th sequence is first calculated. i :
[0208] Z i =tanh(W0H i +b0)
[0209] Z i Let represent the attention score of the i-th sequence;
[0210] tanh() represents a nonlinear function;
[0211] W0 represents the weight;
[0212] H i This represents the i-th sequence output by the Bi-LSTM layer;
[0213] b0 represents the bias amount;
[0214] A is then calculated using SoftMax normalized weights. i :
[0215]
[0216] A i Indicates attention weight;
[0217] exp() represents an exponential function with base e.
[0218] Z i Let represent the attention score of the i-th sequence;
[0219] t represents the length of the sample sequence;
[0220] Z j Let represent the attention score of the j-th sequence;
[0221] Finally, attention feature A is calculated by weighted summation:
[0222]
[0223] A represents the output of the attention mechanism;
[0224] t represents the length of the sample sequence;
[0225] A i Indicates attention weight;
[0226] H i This represents the i-th sequence output by the Bi-LSTM layer;
[0227] ④ CRF Layer: The Conditional Random Field (CRF) layer is used to output the prediction results in the final stage of the model. Here, the BIO labeling method is used for label prediction output. The CRF layer decodes the output [A1, A2, ..., An] of the Attention layer and predicts the optimal label sequence.
[0228] ⑤ Sparse entity optimization strategy: The model continuously adjusts the weights of entity categories and fusion weights based on the characteristics of each batch of data. This method can handle different categories of entities more precisely, especially when the data distribution is uneven or there are sparse entities. At the same time, in order to reduce the interference of sparse entities in the model training process, a reduction factor is introduced, which helps to balance the imbalanced distribution in the dataset, thereby improving the overall performance of the model.
[0229] First, we define a multi-class loss function to calculate the loss for a specific character.
[0230] Loss multi =-ln(P i )
[0231] Among them, Loss multi Represents the multi-class loss function;
[0232] ln() represents the logarithmic function with the natural base e;
[0233] P i This represents the probability that the entity is predicted to be of type i.
[0234] Traditional entity recognition methods typically use static weight settings, but this approach can introduce bias when there is imbalance or sparsity in entity classes. This leads to the development of dynamic class weights W, which dynamically generate class weights based on the entity class distribution in each training batch. This method allows the model to focus more on the fewer or harder-to-identify entity classes during training.
[0235]
[0236] Among them, W i This indicates the degree of attention paid to the i-th entity;
[0237] B represents the number of samples in the batch;
[0238] t represents the length of the sample sequence;
[0239] T i This indicates the number of entities of category i in the batch;
[0240] In addition, a hyperparameter γ is set to adjust the level of attention given to difficult samples. γ is also generated through dynamic policy parameters.
[0241] Loss c =-W i (1-P i ) γ ln(P i )
[0242] Among them, Loss c Indicates the loss of a specific character;
[0243] W i This indicates the degree of attention paid to the i-th entity;
[0244] P i This represents the probability that the entity is predicted to be of type i.
[0245] γ is a hyperparameter representing the degree of attention given to difficult samples;
[0246] ln() represents the logarithmic function with the natural base e;
[0247] Since the input X has a length of t, a loss needs to be calculated for each character. We weight the loss at each position according to the class weight at each position. Therefore, the final multi-class loss formula is set as follows:
[0248]
[0249] Among them, Loss classification This represents the final multi-class classification loss;
[0250] t represents the length of the sample sequence;
[0251] W i This represents the weight corresponding to the real entity category of the i-th character; it also reflects the degree of attention paid to the i-th entity.
[0252] Loss i Represents the loss of the i-th character. c ;
[0253] To effectively learn the contextual relationships between entities, a CRF loss is introduced, calculated as follows. Where S is the score for a specific predicted sequence, S... RealPath This is the score of the true sequence. Among all possible predicted sequences, the higher the score of the true sequence, the smaller the loss. CRF fits the samples by maximizing the score of the true sequence and minimizing the score of the non-true sequences.
[0254]
[0255] Among them, Loss crf Indicates CRF loss;
[0256] e represents the natural base;
[0257] S RealPath The score represents the true sequence;
[0258] S1,S2,…,S k This represents the score for all possible paths;
[0259] Next, fusion factors α and β are used to fuse the two losses. By adjusting the values of α and β, the relative importance between the two losses can be controlled. When the entities in the training data are sparse, the interactions between entities are less, so the fusion factor α of the multi-class loss can be increased. When the entities in the training data are dense, the mutual influence between entities is greater, so the fusion factor β of the CRF loss should be increased.
[0260]
[0261] Where β represents the first fusion factor;
[0262] C represents all entity categories;
[0263] T i This indicates the number of entities of category i in this batch;
[0264] B represents the number of samples in the batch;
[0265] t represents the length of the sample sequence;
[0266] γ is a hyperparameter representing the degree of attention given to difficult samples;
[0267] α = 1 - β
[0268] Where α represents the second fusion factor;
[0269] β represents the first fusion factor;
[0270] C represents all entity categories. T i This represents the number of entities of category i in the batch. B represents the number of samples in the batch, and γ is a hyperparameter that represents the degree of attention paid to the CRF loss, with a value between 0 and 1.
[0271] Finally, a reduction factor μ was designed. The sparser the entities in the training samples, the larger the reduction factor. This method can prevent the model from overlearning features from sparse entity datasets. The value of the reduction factor μ is the same as α, ranging from 0 to 0.5.
[0272] The final Loss function design is as follows:
[0273] Loss=(1-μ)(αLoss classification +βLoss crf )
[0274] Where Loss represents the dynamically adjusted loss;
[0275] μ represents the reduction factor;
[0276] α represents the second fusion factor;
[0277] Loss classification This represents the final multi-class classification loss;
[0278] β represents the first fusion factor;
[0279] Loss crf This represents CRF loss.
[0280] Although embodiments of the invention have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Claims
1. A sparse medical entity recognition method based on an attention mechanism, characterized in that, Includes the following steps: S1, extract word vectors using the BERT model and further extract features using Bi-LSTM; S2 uses the Attention mechanism to extract deep connections within word vectors; S3 dynamically adjusts the entity category weights and fusion weights in the ensemble learning based on the sparsity characteristics of each batch of entities; S4 outputs the prediction results through the CRF layer; Also includes: , in, Indicates dynamically adjusted loss; Indicates the reduction factor; Indicates the second fusion factor; This represents the final multi-class classification loss; Indicates the first fusion factor; Indicates CRF loss; The final method for calculating the multi-class classification loss is as follows: , in, This represents the final multi-class classification loss; Indicates the length of the sample sequence; Indicates the first The weight corresponding to the real entity category of each character; Indicates the first characters ; Loss of specific characters The calculation method is as follows: , in, Indicates the loss of a specific character; Indicates the first The weight corresponding to the real entity category of each character; This indicates that the prediction is for the entity category. The probability of; is a hyperparameter representing the degree of attention given to difficult samples; The method for calculating CRF loss is as follows: , in, Indicates CRF loss; Represents the natural base; The score represents the true sequence; , ,…, This represents the score for all possible paths.
2. The sparse medical entity recognition method based on attention mechanism according to claim 1, characterized in that, Step S1 includes: , in, express The output of the forget gate at any moment; Indicates the activation function; This represents the forget gate weight matrix in an LSTM network. express The hidden layer state of the input vector at time step; express The hidden layer state of the input vector at time step; This indicates the forget gate bias in an LSTM network; , in, express Input gate output at all times; Indicates the activation function; This represents the input gate weight matrix in an LSTM network. express The hidden layer state of the input vector at time step; express The hidden layer state of the input vector at time step; This represents the input gate bias in an LSTM network; , in, express The output gate outputs at any given time; This represents the activation function that maps any real number to (0,1); This represents the output gate weight matrix in an LSTM network. express The hidden layer state of the input vector at time step; express The hidden layer state of the input vector at time step; This indicates the output gate bias in an LSTM network; , in, This represents the candidate state at the current moment; Represents a nonlinear function; This represents the candidate state weight matrix in an LSTM network. express The hidden layer state of the input vector at time step; express The hidden layer state of the input vector at time step; This represents the candidate state bias in the LSTM network; , in, Indicates the current state of the memory cell; express The output of the forget gate at any moment; This indicates element-wise multiplication; express The cell state at any given time; express Input gate output at any given time; Indicates the current input cell state; , in, express The hidden layer state of the input vector at time step; express The output gate outputs at any given time; This indicates element-wise multiplication; Represents a nonlinear function; Indicates the current state of the memory cell; For Bi-LSTM networks, we use This represents the output of the forward LSTM. Let represent the output of the backward LSTM, and The final output of the Bi-LSTM can be obtained by splicing the components. That is: , in, This represents the output of the forward LSTM; Indicates a Bi-LSTM network; express The hidden layer state of the input vector at time step; express The output of the forward LSTM at each time step; , in, This represents the output of the backward LSTM; Indicates a Bi-LSTM network; express The hidden layer state of the input vector at time step; express The output of the LSTM after time step; , This represents the final output of the Bi-LSTM; This indicates a concatenation operation; This represents the output of the forward LSTM; This represents the output of the backward LSTM.
3. The sparse medical entity recognition method based on attention mechanism according to claim 1, characterized in that, Step S3 includes: , Indicates the first Attention scores for each sequence; Represents a nonlinear function; Indicates weight; The output of the Bi-LSTM layer represents the first... A sequence; Indicates the bias amount; Then, the weights are calculated using SoftMax normalization. : , Indicates attention weight; Represented by natural base An exponential function with base 0; Indicates the first Attention scores for each sequence; Indicates the length of the sample sequence; Indicates the first Attention scores for each sequence; Finally, attention feature A is calculated by weighted summation: , This represents the output of the attention mechanism; Indicates the length of the sample sequence; Indicates attention weight; The output of the Bi-LSTM layer represents the first... A sequence.
4. The sparse medical entity recognition method based on attention mechanism according to claim 1, characterized in that, Also includes: , in, Represents the multi-class loss function; This indicates that the prediction is for the entity category. The probability of.
5. The sparse medical entity recognition method based on attention mechanism according to claim 1, characterized in that, The second fusion factor is calculated as follows: , in, Indicates the second fusion factor; This represents the first fusion factor.
6. The sparse medical entity recognition method based on attention mechanism according to claim 5, characterized in that, The first fusion factor is calculated as follows: , in, Indicates the first fusion factor; Represents all entity categories; This indicates that the category in this batch is The number of entities; Indicates the number of samples in the batch; Indicates the length of the sample sequence.