Named entity recognition method and apparatus

By incorporating label semantic information into the named entity recognition method and utilizing attention mechanisms and conditional random field models, the problem of insufficient utilization of label semantic information in existing technologies is solved, thereby improving the accuracy and efficiency of named entity recognition.

CN115146033BActive Publication Date: 2026-06-19BEIJING LONGZHI DIGITAL TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING LONGZHI DIGITAL TECH CO LTD
Filing Date
2022-07-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing named entity recognition methods do not fully utilize the semantic information of tags, resulting in poor recognition performance and low efficiency.

Method used

By introducing a remote supervision method to learn the semantic information of tags, and using an attention mechanism to fuse the semantic information of tags into token features, combined with a conditional random field model for training, the accuracy and efficiency of named entity recognition are enhanced.

Benefits of technology

By incorporating semantic information from tags, the accuracy and efficiency of named entity recognition are improved, enabling better identification of named entities in sequences.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115146033B_ABST
    Figure CN115146033B_ABST
Patent Text Reader

Abstract

This disclosure relates to the field of natural language processing technology, and provides a method and apparatus for named entity recognition. The method includes: using both the entity names and corresponding entity tags of the original corpus as keywords, searching in the original document set to retrieve documents that simultaneously match the entity names and corresponding entity tags of the original corpus, forming a retrieved document set; inputting the original corpus entity sequence set and the retrieved document set into an attention-based training model to obtain a sequence feature vector incorporating semantic information of the entity tags; inputting the obtained sequence feature vector incorporating semantic information of the entity tags into a conditional random field, and after training, obtaining a named entity recognition tag prediction sequence. This disclosure can fully utilize the semantic information of the tags, improve the recognition efficiency of named entities, enhance the features of the input sequence tags, and better complete the named entity recognition task.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of natural language processing technology, and in particular to a named entity recognition method and apparatus. Background Technology

[0002] Current mainstream named entity recognition methods model the named entity recognition task as a sequence labeling task. For example, the semantic relationship between tags is modeled through the BERT pre-trained model, and the transition relationship between tags is modeled through the transition matrix of the conditional random field. However, they do not fully utilize the semantic information of the tags, which naturally contains some prior knowledge. Therefore, a named entity recognition method that incorporates the semantic information of the tags is needed to help the model better recognize named entities in the input sequence.

[0003] Currently, pre-trained language models such as BERT are mainly used to encode sequence features of input text sequences. Since BERT only considers the token information in the input sequence and does not take into account the transition relationship between the current token and the previous token label, a CRF layer is added to the BERT network to model the transition relationship between labels. Existing methods only utilize the semantic information of tokens when encoding sequence token features. This invention introduces a distant supervision method to learn the semantic information of labels, and then uses an attention mechanism to fuse the semantic information of labels into the token features. Through this method, the model can utilize the semantic association between tokens and labels to help the model better identify named entities in the sequence. Summary of the Invention

[0004] In view of this, the present disclosure provides a named entity recognition method and apparatus to solve the problems of insufficient utilization of tag semantic information, poor named entity recognition effect, and low recognition efficiency in the prior art.

[0005] A first aspect of this disclosure provides a named entity recognition method, comprising the following steps:

[0006] The entity names and corresponding entity tags of the original corpus are used as keywords to search the original document set. Documents that match both the entity names and corresponding entity tags of the original corpus are retrieved, forming a retrieved document set. The original corpus entity sequence set and the retrieved document set are input into an attention-based training model to obtain a sequence feature vector incorporating entity tag semantic information. The obtained sequence feature vector incorporating entity tag semantic information is then input into a conditional random field and trained to obtain a named entity recognition label prediction sequence.

[0007] A second aspect of this disclosure provides a named entity recognition method, comprising the following steps:

[0008] Using both the entity names and their corresponding entity tags from the original corpus as keywords, a search is performed in the original document set to retrieve documents that match both the entity names and their corresponding entity tags from the original corpus, thus forming a searched document set.

[0009] Replace the entity names in the retrieved document set with the entity tags corresponding to the entities to form a replaced document set;

[0010] The replaced document set is input into the Skip-gram algorithm model, and the Skip-gram algorithm model is trained to learn the embedding representation vectors of each entity label. The output is the set of embedding representation vectors of each entity label.

[0011] The original corpus entity sequence set is input into the BERT language model to obtain the output vector sequence of bidirectional language representation;

[0012] The output vector sequence of the bidirectional language representation and the set of embedded representation vectors of each entity label corresponding to the original corpus are input into the training model based on the attention mechanism. For all entity labels corresponding to the input vector sequence of the attention mechanism training model: the output vector sequence and the set of embedded representation vectors, attention scores are judged between any two labels to obtain the attention weight matrix between the output sequence of the bidirectional language representation and the set of embedded representation vectors of each entity label corresponding to the original corpus.

[0013] Using the attention weight matrix and the transpose matrix MT of the set of embedding representation vectors of each entity label corresponding to the original corpus, the sequence feature vector after incorporating the semantic information of the entity labels is obtained.

[0014] The sequence feature vector, after incorporating the semantic information of the entity label, is input into a deep neural network based on a self-attention mechanism: Transformer, to obtain the output embedding representation matrix;

[0015] The output embedding representation matrix is ​​input into a linear transformation layer to obtain the transformed embedding representation matrix;

[0016] The transformed embedding representation matrix is ​​input into a Conditional Random Field (CRF) and trained to obtain a named entity recognition label prediction sequence.

[0017] A third aspect of this disclosure provides a system comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method described above.

[0018] A fourth aspect of this disclosure provides an apparatus comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method described above.

[0019] A fifth aspect of this disclosure provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the above-described method.

[0020] The beneficial effects of this disclosed embodiment compared with the prior art are as follows: By using the vector set represented by each entity label, entity label semantic information can be integrated into the entity information. By obtaining the sequence feature vector after integrating entity label semantic information through the training model based on the attention mechanism, the similarity recognition of named entities can be enhanced, and the features of the input sequence label token can be enhanced. In addition, through training with CRF conditional random field, the relationship between the part of speech of the current word and the part of speech of the previous one or several words, the relationship between the part of speech of the current word and the part of speech of the next one or several future words, the relationship between the part of speech of the current word and any input word can be obtained by expressing the part of speech probability of the output word in a richer way, thereby obtaining a better predicted output sequence. This can help the model to better complete the named entity recognition task, make full use of the semantic information of the labels, and improve the recognition efficiency of named entities. Attached Figure Description

[0021] To more clearly illustrate the technical solutions in the embodiments of this disclosure, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0022] Figure 1 This is a schematic diagram illustrating an application scenario of an embodiment of this disclosure;

[0023] Figure 2 This is a flowchart illustrating a named entity recognition method provided in an embodiment of this disclosure;

[0024] Figure 3 This is a flowchart illustrating another named entity recognition method provided in this embodiment of the disclosure;

[0025] Figure 4 This is a flowchart illustrating another named entity recognition method provided in this embodiment.

[0026] Figure 5 This is a flowchart illustrating another named entity recognition method provided in this embodiment of the disclosure;

[0027] Figure 6 This is a flowchart illustrating a named entity recognition method provided in an embodiment of this disclosure.

[0028] Figure 7 This is a schematic diagram of the structure of a system provided in an embodiment of this disclosure. Detailed Implementation

[0029] The embodiments of the present invention can solve the related problems existing in the prior art, as described below.

[0030] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, so as to provide a thorough understanding of the embodiments of this disclosure. However, those skilled in the art will understand that this disclosure may also be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods have been omitted so as not to obscure the description of this disclosure with unnecessary detail.

[0031] A named entity recognition method and apparatus according to embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings.

[0032] Figure 1 This is a schematic diagram illustrating an application scenario of an embodiment of this disclosure. The application scenario may include terminal devices 1, 2, and 3, server 4, and network 5.

[0033] Terminal devices 1, 2, and 3 can be hardware or software. When terminal devices 1, 2, and 3 are hardware, they can be various electronic devices with displays and supporting communication with server 4, including but not limited to smartphones, tablets, laptops, and desktop computers. When terminal devices 1, 2, and 3 are software, they can be installed in the electronic devices described above. Terminal devices 1, 2, and 3 can be implemented as multiple software programs or software modules, or as a single software program or software module; this disclosure does not limit this. Furthermore, various applications can be installed on terminal devices 1, 2, and 3, such as data processing applications, instant messaging tools, social platform software, search applications, shopping applications, etc.

[0034] Server 4 can be a server that provides various services, such as a backend server that receives requests sent by terminal devices with which it has established communication connections. This backend server can receive and analyze the requests sent by the terminal devices and generate processing results. Server 4 can be a single server, a server cluster consisting of several servers, or a cloud computing service center. This disclosure embodiment does not limit this.

[0035] It should be noted that server 4 can be either hardware or software. When server 4 is hardware, it can be various electronic devices that provide various services to terminal devices 1, 2, and 3. When server 4 is software, it can be multiple software programs or software modules that provide various services to terminal devices 1, 2, and 3, or it can be a single software program or software module that provides various services to terminal devices 1, 2, and 3. This disclosure does not limit the scope of the embodiments.

[0036] Network 5 can be a wired network using coaxial cable, twisted pair, and fiber optic connection, or it can be a wireless network that enables interconnection of various communication devices without wiring, such as Bluetooth, Near Field Communication (NFC), and Infrared. This disclosure does not limit the scope of the network.

[0037] Users can establish a communication connection with server 4 via network 5 through terminal devices 1, 2 and 3 to receive or send information, etc.

[0038] It should be noted that the specific types, quantities, and combinations of terminal devices 1, 2, and 3, server 4, and network 5 can be adjusted according to the actual needs of the application scenario, and this disclosure embodiment does not impose any restrictions on this.

[0039] Users can perform named entity recognition through any of the terminal devices 1, 2, 3, server 4, and network 5, thereby solving the technical problem of the present invention and achieving the corresponding technical effect.

[0040] As mentioned in the background section, existing technologies achieve named entity recognition by adding a CRF layer to the BERT network to model the transition relationships between labels. This method mainly includes the following:

[0041] For the input sequence W ,in , where n represents the length of the input sequence. Represents the first of the input sequence Each token is input into the pre-trained language model BERT, and the resulting output sequence is represented by matrix X = express:

[0042]

[0043] Where R represents the real number space, and d represents the dimension of the output embedding representation of the input sequence tokens. Represents the i-th token, i.e. The output embedding representation obtained after passing through the BERT model.

[0044] Input matrix X into a linear transformation layer to obtain the token rating matrix. Where k represents the k tags of the sequence token, Let represent the tag distribution probability of the i-th token. This represents the probability that the i-th token belongs to the k-th tag.

[0045] For the predicted sequence of the input sequence X The scoring is defined as follows:

[0046]

[0047] in The transition matrix between labels (including start and end labels).

[0048] Predicted sequence The probability of occurrence is:

[0049]

[0050] in The input sequence W contains all possible label sequences.

[0051] The best predicted sequence of named entities obtained by the model is:

[0052]

[0053] It is evident that existing methods only utilize the semantic information of the token when encoding sequence token features. This invention introduces a remote supervision method to learn the semantic information of the tag, and then uses an attention mechanism to fuse the semantic information of the tag into the token features. Through this method, the model can utilize the semantic association between the token and the tag to help the model better identify named entities in the sequence.

[0054] Figure 2 This is a flowchart illustrating a named entity recognition method provided in an embodiment of this disclosure. Figure 2 Named entity recognition methods can be derived from Figure 1 The terminal device or server executes the command. For example... Figure 2 As shown, the named entity recognition method includes:

[0055] S201, use both the entity names and corresponding entity tags of the original corpus as keywords to search in the original document set;

[0056] S202, retrieve documents that simultaneously match the entity name and corresponding entity tag of the original corpus, forming a set of retrieved documents;

[0057] S203, input the original corpus entity sequence set and the retrieved document set into the attention-based training model;

[0058] S204, obtain the sequence feature vector after incorporating semantic information of entity labels;

[0059] S205, the sequence feature vector after incorporating the semantic information of entity labels is input into the conditional random field;

[0060] S206, after training and learning, obtains the named entity recognition label prediction sequence.

[0061] The method steps are represented by "SXXX", where XXX is a three-digit consecutive number, such as S201, S202, S203, S204, S205, S206. It should be understood that the sequence number of each step in the embodiment does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the disclosed embodiment. The understanding of the method steps in other specific embodiments is the same as in this embodiment, and will not be repeated here.

[0062] Specifically, by using the entity names and corresponding entity labels of the original corpus as keywords, a retrieval is performed in the original document set D to form a retrieved document set DE. This yields a set of entity label information. The original corpus entity sequence set W and the retrieved document set DE are then input into an attention-based training model for training. By fusing the semantic information of the labels, the recognition of similarity characteristics is improved, enhancing the entity recognition of the original corpus entity sequence set. Furthermore, by inputting the sequence feature vector E, which incorporates the semantic information of entity labels, into a Conditional Random Field (CRF), the model can leverage the characteristics of CRF training: the relationship between the part of speech of the current word and the parts of speech of the previous or previous few words, the relationship between the parts of speech of the current word and the parts of speech of the next or next few future words, the relationship between the parts of speech of the current word and any input word, etc., to more richly express the part of speech probabilities of the output words, resulting in a better predicted output sequence.

[0063] According to the technical solution provided in the embodiments of this disclosure, named entities in a sequence can be identified well, and a better identification effect is achieved.

[0064] Figure 3 This is a flowchart illustrating another named entity recognition method provided in this embodiment. Figure 3 Named entity recognition methods can be derived from Figure 1 The terminal device or server executes the command. For example... Figure 3 As shown, the named entity recognition method includes:

[0065] S301, use both the entity names and corresponding entity tags of the original corpus as keywords to search in the original document set D;

[0066] S302, retrieve documents that simultaneously match the entity name and corresponding entity tag of the original corpus, forming the retrieved document set DE;

[0067] S303, replace the entity names in the retrieved document set DE with the entity labels corresponding to the entities to form the replaced document set DR;

[0068] S304, input the original entity sequence set W and the replaced document set DR into the attention-based training model;

[0069] S305, obtain the sequence feature vector E after incorporating the semantic information of entity labels;

[0070] S306, the sequence feature vector E after incorporating the semantic information of entity labels is input into the Conditional Random Field (CRF);

[0071] S307, after training and learning, obtains the named entity recognition label prediction sequence Y.

[0072] Specifically, in Figure 2 Based on the disclosed embodiments, by further processing the information incorporating entity labels: replacing the entity names in the retrieved document set DE with the entity labels corresponding to the entities to form a replaced document set DR, and inputting the original corpus entity sequence set W and the replaced document set DR into the training model based on the attention mechanism, the degree of semantic information recognition can be further improved, and a better predicted output sequence can be obtained after passing through the Conditional Random Field (CRF).

[0073] Figure 4 This is a schematic diagram of a named entity recognition method provided in an embodiment of this disclosure. The named entity recognition method includes:

[0074] S401, use both the entity names and corresponding entity tags of the original corpus as keywords to search in the original document set D;

[0075] S402, retrieve documents that simultaneously match the entity name and corresponding entity tag of the original corpus, forming the retrieved document set DE;

[0076] S403, replace the entity names in the retrieved document set DE with the entity labels corresponding to the entities to form the replaced document set DR;

[0077] S404, input the replaced document set DR into the word2vec algorithm model, and train and learn the embedding representation vectors of each entity label in the word2vec algorithm model;

[0078] S405, outputs the set of embedding representation vectors M for each entity label corresponding to the original corpus;

[0079] S406, the original entity sequence set W and the replaced document set DR are input into the attention-based training model;

[0080] S407, obtain the sequence feature vector E after incorporating semantic information of entity labels;

[0081] S408, the sequence feature vector E after incorporating the semantic information of entity labels is input into the Conditional Random Field (CRF);

[0082] S409, after training and learning, obtains the named entity recognition label prediction sequence Y.

[0083] Specifically, through Figure 3 Based on the disclosed embodiments, the information incorporating entity tags is further processed: the replaced document set DR is input into the word2vec algorithm model, the word2vec algorithm model is trained to learn the embedding representation vectors of each entity tag, and the output is the set of embedding representation vectors M of each entity tag corresponding to the original corpus. The word2vec algorithm model can be used to train words to reconstruct vectors. After training, the word2vec model can be used to map each word to a vector, which can be used to represent the relationship between words.

[0084] According to the technical solutions provided in the embodiments of this disclosure, the semantic learning and recognition level of information integrated with entity tags can be further enhanced.

[0085] Figure 5 This is a schematic diagram of a named entity recognition method provided in an embodiment of this disclosure. The named entity recognition method includes:

[0086] S501, use the entity names and corresponding entity tags of the original corpus as keywords to search in the original document set D;

[0087] S502, retrieve documents that simultaneously match the entity name and corresponding entity tag of the original corpus, forming the retrieved document set DE;

[0088] S503, replace the entity names in the retrieved document set DE with the entity labels corresponding to the entities to form the replaced document set DR;

[0089] S504, input the replaced document set DR into the word2vec algorithm model, and train and learn the embedding representation vectors of each entity label in the word2vec algorithm model;

[0090] S505, outputs the set of embedding representation vectors M for each entity label corresponding to the original corpus;

[0091] S506, input the original corpus entity sequence set W into the BERT training language model to obtain the output vector sequence X of bidirectional language representation;

[0092] S507, the output vector sequence X of the bidirectional language representation and the replaced document set DR are input together into the training model based on the attention mechanism;

[0093] S508, obtain the sequence feature vector E after incorporating the semantic information of entity labels;

[0094] S509, the sequence feature vector E after incorporating the semantic information of entity labels is input into the Conditional Random Field (CRF);

[0095] S510, after training, obtains the named entity recognition label prediction sequence Y.

[0096] Specifically, through Figure 4 Based on the disclosed embodiments, the original corpus entity sequence set W is input into the BERT training language model to obtain the output vector sequence X of bidirectional language representation.

[0097] BERT stands for Bidirectional Encoder Representation from Transformers. It is a pre-trained language representation model. The foundation of the BERT model is the Transformer. When processing a word, BERT can take into account the information of the words before and after the word, thereby obtaining the semantics of the context.

[0098] To achieve deep bidirectional representation, enabling each word to indirectly see itself within multiple layers of context, the BERT model employs a simple strategy: Masked Language Model (MLM). The MLM strategy randomly masks some input tokens (fixed-dimensional vectors) and then predicts these masked tokens (fixed-dimensional vectors). The specific process of using the MLM strategy is as follows: when inputting a sentence, some words to be predicted are randomly selected and replaced with a special symbol. Then, the model learns the words to be filled in these places based on the given labels.

[0099] The second strategy adds a sentence-level continuity prediction task to the bidirectional language model, which predicts whether the two texts input to BERT are consecutive. Introducing this task can help the model learn the relationship between consecutive text segments better.

[0100] Specifically, using the BERT model as the natural language training model allows each word segment and its corresponding part-of-speech information to be predicted and trained in the context. It can also learn the continuity between two text segments and learn and predict the words, parts of speech, contextual relationships, and text continuity of the training corpus, thus achieving better results in natural language processing and semantic learning.

[0101] The technical solutions provided in the embodiments of this disclosure can further enhance the semantic learning and recognition level of the original corpus entity sequences.

[0102] Preferably, in the attention-based training model, for all input entity labels, attention scores are judged between any two labels to obtain the attention weight matrix A.

[0103] Preferably, the set of input vectors in the attention-based training model, namely the set of embedded representation vectors of each entity label, M, is transposed to obtain MT. Combined with the attention weight matrix A, the sequence feature vector E after incorporating the semantic information of the entity labels is obtained, where E is A*MT.

[0104] Preferably, after obtaining the sequence feature vector E incorporating entity label semantic information, the sequence feature vector E incorporating entity label semantic information is input into a deep neural network based on a self-attention mechanism: Transformer, to obtain the output embedding representation matrix H; H is input into a conditional random field (CRF), and after training, the named entity recognition label prediction sequence Y is obtained.

[0105] Preferably, after obtaining the output embedding representation matrix H, the output embedding representation matrix H is input into the linear transformation layer to obtain the transformed embedding representation matrix O; the transformed embedding representation matrix O is input into the conditional random field (CRF), and after training, the named entity recognition label prediction sequence Y is obtained.

[0106] Figure 6 This is a schematic diagram of a named entity recognition method provided in an embodiment of this disclosure. The named entity recognition method includes:

[0107] S601, the entity names: Adidas, Starbucks, Uniqlo, as well as the entity labels BRAND corresponding to Adidas, Starbucks, Uniqlo, and the keyword descriptions "brand" and "brand name" corresponding to the entity labels BRAND are all used as keywords to be searched;

[0108] S602, retrieve the entity name, its corresponding entity tag, and keywords as keywords in the original document set D;

[0109] S603, after retrieval, a document set DE is formed;

[0110] S604, replace the entity names in the document collection DE with the entity labels corresponding to the entities, and form a document collection DR after replacement;

[0111] S605, input the document set DR formed after replacement into the Skip-gram algorithm model to obtain the embedding representation vector of each entity label;

[0112] S606, obtain the original corpus entity sequence set W;

[0113] S607, Input the original corpus entity sequence set W into the BERT training language model, and output the sequence set X;

[0114] S608, simultaneously input the embedding representation vectors of each entity label and the sequence set X into the training model based on the attention mechanism to obtain the sequence feature vector E after incorporating the semantic information of the entity labels;

[0115] S609, the sequence feature vector E, after incorporating the semantic information of the entity labels, is input into the Transformer layer to obtain the embedding representation matrix. ;

[0116] S610, the sequence embedding representation matrix H is passed through a linear transformation layer and projected onto the entity label space to obtain the embedding representation matrix O;

[0117] S611, the embedding representation matrix O is input into the CRF layer to learn the named entity recognition label prediction sequence of the input sequence;

[0118] S612, obtain the named entity recognition label prediction sequence y.

[0119] Specifically:

[0120] Entity node definition: Define the knowledge graph as... Where E represents the set of entity nodes in the knowledge graph, E={ }, , This represents the number of entity nodes in the knowledge graph. Representing the first in the knowledge graph L represents the set of entity nodes in the knowledge graph ontology. }, , This represents the number of entity tags in the knowledge graph ontology. Representing the first in the knowledge graph ontology Individual tags.

[0121] Document collection definition: Document collection D = { },in , This represents the number of documents in the document collection. Represents the first document in the document collection One document.

[0122] A collection of entity tags in a knowledge graph ontology. Each entity tag in the table can be supplemented with keyword description information. For example, the entity tag "BRAND" for a product brand can be supplemented with the keyword descriptions "brand" and "brand name" to facilitate more accurate retrieval of documents related to the entity node in the future.

[0123] For each entity node in the knowledge graph G Use the name of entity node e and its corresponding entity tag on document collection D. Documents that match both keywords are retrieved and then a document set is formed. ={ },in , This represents the number of documents retrieved from document set D for entity e. The collection of documents retrieved representing entities The first in Each document. The documents retrieved from all entity nodes form a document collection. ,in Representing the first in the knowledge graph Entity Nodes The collection of related documents retrieved by name and alias. This represents a merged set.

[0124] For entity nodes in knowledge graph G and its corresponding entity tag ,in Entity Node The entity label is the first entity label in the entity label set of the knowledge graph ontology. One tag. (Regarding) The collection of related documents retrieved by name Each document Replace the entity tag in the document with the entity name of that entity.

[0125] For example, suppose a knowledge graph G contains an entity node e named "Adidas" and its corresponding entity tag. Given the entity tag "BRAND", the keywords for "BRAND" include "brand" and "brand name". Using "Adidas" and the keywords "brand" and "brand name" from the entity tag "BRAND" as keywords, a search is performed in document set D. Then, entity tagging and replacement are applied to the retrieved documents to obtain the document set. The Replace attribute represents the entity tag replacement operation. Specifically, it replaces instances of "brand" and "brand name" in the retrieved documents with the special entity tag "[BRAND]", as shown in the table below:

[0126]

[0127] The relevant documents returned by entity retrieval are used as corpus for entity label embedding representation learning. The skip-gram algorithm is used to learn the embedding representation of each entity label in the knowledge graph ontology. .

[0128] For the input sequence W The output sequence obtained by inputting it into the pre-trained language model BERT is represented by matrix X = express:

[0129]

[0130] Embedding each entity label represents The input matrix X of the output sequence is used to learn the attention mechanism in the model. The attention mechanism is used to learn the weight score of each token in the sequence to the label, and the weighted sum is used to obtain the label embedding representation corresponding to the word.

[0131]

[0132] in Let A represent the attention scoring function, and let A represent the calculated attention weight matrix. This represents the attention score of the j-th tag in the entity tag set corresponding to the i-th token in the input sequence, and its value is between 0 and 1. =1.

[0133] The embedding representations of entity labels are fused into the embedding representations of the input sequence tokens using an attention weight score matrix:

[0134]

[0135] in It is an entity tag embedding representation matrix The transpose of the matrix, It is the input sequence token feature after incorporating semantic information of entity labels.

[0136] Token sequences that incorporate semantic information from entity tags Input a single Transformer layer, perform token context feature fusion, and obtain the output embedding representation matrix. :

[0137]

[0138] The sequence embedding representation matrix H, after fusing context features, is then passed through a linear transformation layer and projected onto the entity label space to obtain the embedding representation matrix. :

[0139]

[0140] Linear represents a linear transformation layer. Finally, The input sequence is fed into a CRF layer to learn the named entity recognition label prediction sequence. .

[0141] All of the above-mentioned optional technical solutions can be combined in any way to form the optional embodiments of this application, and will not be described in detail here.

[0142] Figure 7 This is a schematic diagram of the system 700 provided in an embodiment of this disclosure. For example... Figure 7 As shown, the system 700 of this embodiment includes: a processor 701, a memory 702, and a computer program 703 stored in the memory 702 and executable on the processor 701. When the processor 701 executes the computer program 703, it implements the steps in the various method embodiments described above. Alternatively, when the processor 701 executes the computer program 703, it implements the functions of each module / unit in the various device embodiments described above.

[0143] For example, computer program 703 may be divided into one or more modules / units, which are stored in memory 702 and executed by processor 701 to perform the present disclosure. The one or more modules / units may be a series of computer program instruction segments capable of performing a specific function, which describe the execution process of computer program 703 in system 700.

[0144] System 700 can be an electronic device such as a desktop computer, laptop, handheld computer, or cloud server. System 700 may include, but is not limited to, a processor 701 and a memory 702. Those skilled in the art will understand that... Figure 7 This is merely an example of system 700 and does not constitute a limitation on system 700. It may include more or fewer components than shown, or combine certain components, or different components. For example, the system may also include input / output devices, network access devices, buses, etc.

[0145] The processor 701 can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor.

[0146] The memory 702 can be an internal storage unit of the system 700, such as a hard disk or RAM of the system 700. The memory 702 can also be an external storage device of the system 700, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, or Flash Card equipped on the system 700. Furthermore, the memory 702 can include both internal storage units and external storage devices of the system 700. The memory 702 is used to store computer programs and other programs and data required by electronic devices. The memory 702 can also be used to temporarily store data that has been output or will be output.

[0147] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this application. The specific working process of the units and modules in the above system can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0148] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0149] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this disclosure.

[0150] In the embodiments provided in this disclosure, it should be understood that the disclosed apparatus / systems and methods can be implemented in other ways. For example, the apparatus / system embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. Multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0151] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0152] Furthermore, the functional units in the various embodiments of this disclosure can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0153] If an integrated module / unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program may include computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. A computer-readable medium may include: any entity or device capable of carrying computer program code, recording media, USB flash drives, portable hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in a computer-readable medium may be appropriately added to or subtracted according to the requirements of legislation and patent practice in a jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.

[0154] The above embodiments are only used to illustrate the technical solutions of this disclosure, and are not intended to limit it. Although this disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this disclosure, and should all be included within the protection scope of this disclosure.

Claims

1. A method of named entity recognition, characterized by, include: Using both the entity names and their corresponding entity tags from the original corpus as keywords, a search is performed in the original document set to retrieve documents that match both the entity names and their corresponding entity tags from the original corpus, thus forming a searched document set. The original corpus entity sequence set and the retrieved document set are input together into the attention-based training model to obtain the sequence feature vector after incorporating the semantic information of entity labels; The sequence feature vector, after being incorporating semantic information of entity labels, is input into a conditional random field and trained to obtain a named entity recognition label prediction sequence. In this process, after forming the retrieved document set, the entity names in the retrieved document set are replaced with entity labels corresponding to the entities to form a replaced document set. The original corpus entity sequence set and the replaced document set are input into the training model based on the attention mechanism to obtain the sequence feature vector after incorporating the semantic information of entity labels. After forming the replaced document set, the replaced document set is input into the word2vec algorithm model. The word2vec algorithm model is trained to learn the embedding representation vectors of each entity label, and the output is the set of embedding representation vectors of each entity label corresponding to the original corpus. The original corpus entity sequence set and the set of embedding representation vectors of each entity label are input into the training model based on the attention mechanism to obtain the sequence feature vector after incorporating the semantic information of the entity labels.

2. The named entity recognition method according to claim 1, characterized in that, The word2vec algorithm model is the Skip-gram algorithm model.

3. The named entity recognition method according to claim 1, characterized in that, The original corpus entity sequence set is input into the BERT training language model to obtain the output vector sequence of bidirectional language representation; the original corpus entity sequence set and the output vector sequence of bidirectional language representation are input together into the attention-based training model to obtain the sequence feature vector after incorporating the semantic information of entity labels.

4. The named entity recognition method according to any one of claims 1-3, characterized in that, In the attention-based training model, for all input entity labels, attention scores are determined between any two labels to obtain the attention weight matrix.

5. The named entity recognition method according to claim 4, characterized in that, The set of retrieved documents, or the set of replaced documents, or the set of embedded representation vectors of each entity label in the input vector set of the model trained based on the attention mechanism are transposed to obtain DET, or DRT, or MT. Combined with the attention weight matrix, a sequence feature vector is obtained after incorporating the semantic information of the entity label, where the sequence feature vector is the result corresponding to A*DET, or A*DRT, or A*MT.

6. The named entity recognition method according to claim 5, characterized in that, After obtaining the sequence feature vector incorporating entity label semantic information, the sequence feature vector incorporating entity label semantic information is input into a deep neural network based on a self-attention mechanism to obtain an output embedding representation matrix; the output embedding representation matrix is ​​input into a conditional random field, and after training, a named entity recognition label prediction sequence is obtained.

7. The named entity recognition method according to claim 6, characterized in that, After obtaining the output embedding representation matrix, the output embedding representation matrix is ​​input into a linear transformation layer to obtain a transformed embedding representation matrix; the transformed embedding representation matrix is ​​input into a conditional random field, and after training, a named entity recognition label prediction sequence is obtained.

8. An apparatus comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the method of any one of claims 1 to 7.