An event extraction method and system based on event semantic enhancement
By training an event type label semantic model and an attention mechanism, the problem of low event extraction performance in single-statement multi-event scenarios is solved, and efficient and accurate event type and argument role recognition is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NAT UNIV OF DEFENSE TECH
- Filing Date
- 2023-07-07
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies suffer from poor event extraction performance and polysemy in single-statement, multi-event scenarios, leading to incorrect event type identification and ignoring the semantic impact of event type labels.
We adopt an event-based semantic enhancement approach. By training an event type label semantic model, we use the BERT model to learn global and local event type label semantics. By combining attention mechanisms and information fusion, we can identify event types in sentences and enhance the semantics of event type labels.
It improves the accuracy and efficiency of event extraction, and can accurately identify trigger words and argument roles in single-statement multi-event scenarios, reduce noise interference, and enhance the semantic representation of event type labels.
Smart Images

Figure CN116991970B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of natural language processing, and more particularly to an event extraction method and system based on event semantic enhancement. Background Technology
[0002] Event extraction is an information extraction task for unstructured or semi-structured data. Unlike traditional knowledge graph-based information extraction of entities, relationships, and attributes, event extraction focuses on "events," presenting unstructured data containing event information in a structured form. The core semantics of an event are expressed by trigger words and arguments. The event extraction task includes four parts: event trigger word detection, event trigger word classification, event argument identification, and event argument role classification. Event trigger word detection identifies words in the text that trigger an event; event trigger word classification determines the event type triggered by the current trigger word; event argument identification identifies whether words (entities, values, times, etc.) in the text participate in the event; and event argument role classification determines the specific role (time, place, attacker, victim, etc.) of the event arguments within the event.
[0003] Event extraction involves training a specific language model based on the extraction task to identify and classify trigger words and arguments, and classify roles. Existing classifier-based event extraction methods treat the event extraction task as multiple classification tasks. They design an event extraction model consisting of a language model with specific functions and multiple classifiers, train the model using labeled data, and enable the model to learn task features and identify trigger words that trigger a specific event type from the token sequence of the input sample, as well as the elements related to that trigger word that completely describe the occurrence of an event. These methods typically use "hard labels" to describe event type labels, meaning a trigger word corresponds to one or more of all event type labels. These labels only indicate the category and do not possess semantic information. Therefore, during training, the model mainly learns the features of different labels to identify trigger words that match those features. In this process, the model may learn some semantic information about the labels, but since the model's goal is to learn all the features of the trigger words corresponding to the labels rather than the semantics of the event type labels, the semantics of the event type labels learned by the model are weak.
[0004] Existing event extraction methods typically extract events within a single sentence containing only a single event. However, due to the unique nature of language, the number of events described in a single sentence is random. A sentence might not mention any events, might describe only one event, or might describe multiple events. To improve the performance of single-sentence, single-event extraction, some practitioners have proposed extracting events from paragraphs and articles to broaden the extraction scope and improve performance based on more contextual information. However, this approach ignores the fact that people often describe events in a single sentence, meaning a single sentence may contain multiple events. Consequently, the event extraction performance in scenarios where a single sentence contains multiple events is significantly lower than that of single-sentence, single-event extraction.
[0005] In scenarios with multiple events within a single sentence, the problem of polysemy arises. Polysemy can lead to misclassification of event types when a single trigger word may trigger multiple event types. To address this, existing techniques typically enhance the event semantics of trigger words by leveraging context, syntactic structure, or dependencies between multiple events to improve event extraction performance. However, these methods generally enhance trigger word semantics based on local information within the sentence and global information at the document level, increasing the difference in label features between different event types, while neglecting the impact of event type label semantics on event extraction. This results in persistently low extraction performance. Therefore, there is an urgent need for an event extraction method that analyzes from the perspective of event type label semantics, achieving event extraction based on enhanced event semantics, thereby improving the accuracy of event extraction. Summary of the Invention
[0006] The technical problem to be solved by this invention is: In view of the technical problems existing in the prior art, this invention provides an event extraction method and system based on event semantic enhancement that is simple to implement, has high extraction efficiency and accuracy, and strong anti-interference ability. It can perform semantic enhancement of event type labels based on sentence-level event classification tasks, thereby improving the efficiency and accuracy of event extraction.
[0007] To solve the above-mentioned technical problems, the technical solution proposed by this invention is as follows:
[0008] An event extraction method based on event semantic enhancement includes the following steps:
[0009] Step S1: Event Type Label Semantic Model Training: The event label semantic model is trained using training set samples to identify event types in sentences and learn the semantics of event type labels. The event label semantic model sequentially performs global event type label semantic learning, local event type label semantic enhancement, and information fusion and event type decision on the input sample sequence. The global event type label semantic learning is used to learn the semantics of sentence event type labels to obtain a global event type label semantic vector representation. The local event type label semantic enhancement adopts an attention mechanism, generating an attention weight matrix based on the contribution of different tokens in the input sequence to event type identification, forming a local event type label semantic vector representation. The information fusion and event type decision are used to fuse the global event type label semantic vector representation and the local event type label semantic vector representation to make an event type decision.
[0010] Step S2, Event Extraction: The text to be extracted is input into the trained event label semantic model to obtain sentence event semantic information. The text to be extracted is fused with the sentence event semantic information and then the event type is classified. After identifying the trigger word and the corresponding event type, the argument and argument role type are identified.
[0011] Furthermore, in step S1, the event label semantic model obtains the global event type label semantic vector representation of the input sample by inputting the input sample sequence into the base model BERT (Bidirectional Encoder Representations from Transformer), thereby realizing the global event type label semantic learning of the input sample sequence.
[0012] Further, in step S1, the local event type label semantic enhancement performed by the event label semantic model includes: calculating the weights of different tokens in the input sample sequence to obtain an attention weight matrix based on the contribution of different tokens in the input sample sequence to event type recognition, wherein tokens related to the event have larger attention scores and tokens unrelated to the event have smaller attention scores; and weighting the attention weight matrix with the word vectors of the tokens in the input sequence to obtain the local event type label semantic vector representation of the input sample sequence, wherein when the trigger word in the input sample sequence X triggers k different event types, the vector representation of the local event type semantics corresponding to the input sample sequence X is:
[0013]
[0014] in,
[0015]
[0016]
[0017] Ai represents the weight matrix for the i-th event type, q i It is the word embedding representation of the trigger word, k j Let E be the word embedding representation of the j-th token in the input sample sequence X, and let E be the label of all event types in the dataset. i The set of x j Let d represent the j-th element in the input sample sequence X, and d be the word embedding dimension of BERT.
[0018] Furthermore, in step S1, the event tag semantic model performs information fusion and event type decision-making, including:
[0019] Step S101. Merge the global event type label semantic vector representation and the local event type label semantic vector representation to obtain the event type vector representation l = {l1, l2, ..., l...} contained in the input statement. d};
[0020] Step S102. Input the event type vector representation l into the feedforward neural network layer, and calculate the conditional probability distribution vector P = {p1, p2, ..., p...} of the input sequence belonging to different event type labels. m}, where m is the number of event type tags;
[0021] Step S103. Input the conditional probability distribution vector P of the different event type labels of the input samples into the event decision maker, and quantize all vectors to obtain P' = {p'1, p'2, ..., p' m} and make P'∈[0,1], then change each element p' in vector P'. i Compared with the first preset threshold t1 in the threshold decision maker, if the i-th vector p' i If the value is greater than the first preset threshold t1, the input sample is determined to trigger the i-th event type. After comparison, the event label Y' of the input sample sequence is finally obtained.
[0022] Furthermore, after step S103, the model is further fine-tuned according to the following formula:
[0023]
[0024] Where Loss(Y, Y') represents the loss between the predicted event type Y' and the actual event type Y for this sentence, y i This represents the actual result of whether the i-th event type has occurred.
[0025] Furthermore, in step S2, fusing the text to be extracted with the semantic information of the sentence event includes: inputting the text to be extracted X of length n into the event tag semantic model to obtain the word vector representation W = {w1, w2, ..., w...} of all tokens in the sequence to be extracted X. n The semantic vector representation of sentence events of the sequence to be extracted, L = {l1, l2, ..., l}, is given by the sequence X to be extracted. d}, where w i ={w 1 i ,w 2 i ,...,w d i}, where n is the number of tokens and d is the word embedding dimension of the BERT model. The word vectors of each token in the sentence event semantic vector representation L and the word vector representation W are fused to obtain the fused word vector representation W' = {w'1, w'2, ..., w'} of the sequence to be extracted. n}
[0026] Furthermore, in step S2, the event type classification includes: representing all word vectors of the sequence to be extracted as W'={w'1,w'2,...,w'...} n The input is fed into a feedforward neural network layer, where n is the number of tokens. The probability vector p of each token's word vector belonging to different event type tags is calculated. i ={p 1 i ,p 2 i ,...,p m i}, where m is the number of event type labels; the probability vector P = {p1, p2, ..., p n The input is fed into the activation function to quantize all vectors, and the score S = {s1, s2, ..., s...} is calculated to map the sequence to be extracted to all possible output event types. n}, obtain the score s of each token in the sequence to be extracted on all events in the event type label set. i ={s 1 i ,s 2 i ,...,s m i}
[0027] Furthermore, in step S2, the step of identifying the trigger word and the corresponding event type includes:
[0028] The optimal model is obtained by fine-tuning the current model according to the following formula:
[0029]
[0030] Among them, y i j ∈[0,1],y i j This indicates whether the i-th token triggers the j-th event type. If the i-th token triggers the j-th event type, then y i j If the value is 1, then the value is 0; otherwise, the value is 0.
[0031] The event type scores S = {s1, s2, ..., s} of all tokens in the sequence to be extracted are calculated. n The input is fed into the event type decision-maker and compared with the second preset threshold t2 in the event type decision-maker. If the score s corresponding to the j-th event type is... j i If the event type is greater than the second preset threshold t2, then the i-th token is determined to trigger the j-th event type; after obtaining the event types of all tokens, consecutive tokens with the same predicted event type are considered as a span of a trigger word, thus obtaining the list of predicted trigger words l of the input sequence. t '=[(t s1 ,t e1 ,e1),...,(t sk ,t ek ,e k )).
[0032] Furthermore, in step S2, the step of identifying arguments and argument role types includes:
[0033] The word vector representations W' of the sequence to be extracted are input into the head-feedforward neural network and the tail-feedforward neural network to calculate the probability vector P of each token word vector belonging to the header position of different argument role type tags. si ={p 1 si ,p 2 si ,...,p M si} and tail position probability vector P ei ={p 1 ei ,p 2 ei ,...,p M ei}, where M is the number of argument character tags;
[0034] The head position probability vector Psi The probability vector P of the tail position ei The input is fed into the activation function to quantize all vectors, and the head position score S of the sequence to be extracted, mapped to all argument role types, is calculated. s ={s s1 ,s ss2 ,...,s sn Score for tail position e ={s e1 ,s e2 ,...,s en}, where the head position score of the argument role of token i is s si ={s 1 si ,s 2 si ,...,s M si} and tail position score s ei ={s 1 ei ,s 2 ei ,...,s M ei};
[0035] The optimal model is obtained by fine-tuning the model according to the following formula:
[0036]
[0037]
[0038] Where y si j ∈[0,1],y si j This indicates whether the i-th token is the head position of the j-th argument role type. If the i-th token is the head position of the j-th argument role type, then y si j The value is 1 if the input sequence is a token and 0 otherwise. Ys and Ye represent whether all tokens in the input sequence are the head and tail positions of argument roles, respectively. ei j This indicates whether the i-th token is the starting position of the j-th argument; if so, it is 1, otherwise it is 0.
[0039] Score the head position of all tokens in the sequence to be extracted. s ={s s1 ,s s2 ,...,s sn Score for tail position e ={s e1 ,se2 ,...,s en The input is fed into the argument role type decision-maker and compared with the third preset threshold t3 in the argument role type decision-maker. If the score of the j-th head position is s... j si If the value is greater than the third preset threshold t3, then the i-th token is determined to be the head position of argument role type j; if the score of the j-th head position is s... i sj If the value is greater than the third preset threshold t3, then the j-th token is determined to be the tail position of argument role type i, until the list of argument roles corresponding to all event trigger words is obtained, where the list of predicted argument roles obtained from the i-th trigger is l. rti '=[(r s1 ,r e1 ,r1),...,(r sk ,r ek ,r k )).
[0040] An event extraction system based on event semantic enhancement includes a processor and a memory, wherein the memory is used to store a computer program and the processor is used to execute the computer program to perform the method described above.
[0041] Compared with the prior art, the advantages of the present invention are as follows:
[0042] 1. This invention adds a sentence event type classification task during the pre-training stage and uses training set samples to train an event type label semantic model. This enables the model to learn the semantic information of event type labels. While correctly identifying all event types in a sentence, it can also learn the semantics of event type labels, thereby greatly improving the performance of the pre-trained model.
[0043] 2. This invention employs an attention mechanism for local event type label semantic enhancement. Based on the contribution of different tokens in the input sample sequence to event type recognition, the weights of different tokens in the input sequence are calculated. This results in event-related tokens having higher attention scores and event-irrelevant tokens having lower attention scores in the attention weight matrix. This allows the event type label to learn as much event information as possible from event-related tokens, enhancing the semantics of the event type label while minimizing noise interference with event type semantic learning. Consequently, it obtains the most accurate enhanced semantic information describing the event type label, improving the accuracy of event recognition.
[0044] 3. This invention pre-trains the model through a sentence event type classification task, which enables the model to learn the sentence semantic vector representation of the target event type from having no knowledge of the semantics of the target event type label, thereby facilitating the accurate representation of the semantic vector of the trigger word corresponding to the target event type.
[0045] 4. This invention can learn the semantics of a single event type label, the semantics of fusion of multiple event type labels, and the semantics of sentence event type based on the simultaneous occurrence of a single event and multiple events in different samples. When extracting multiple events from a single sentence, by fusing sentence event semantics instead of context semantics into token semantics, it improves the accuracy of token event semantic representation while reducing the impact of noisy token semantics in the context on the semantic representation of trigger words. This helps to determine the specific event type triggered by trigger words that can trigger multiple event types in the current environment, thereby improving the accuracy of single event recognition and the accuracy of multi-event recognition. Attached Figure Description
[0046] Figure 1 This is a schematic diagram illustrating the implementation process of the event extraction method based on event semantic enhancement in this embodiment.
[0047] Figure 2 This is a schematic diagram illustrating the implementation principle of the event type label semantic learning model in this embodiment.
[0048] Figure 3 This is a schematic diagram illustrating the implementation of event extraction based on event semantic enhancement in this embodiment. Detailed Implementation
[0049] The present invention will be further described below with reference to the accompanying drawings and specific preferred embodiments, but this does not limit the scope of protection of the present invention.
[0050] like Figure 1 As shown, the steps of the event extraction method based on event semantic enhancement in this embodiment include:
[0051] Step S1, Event Type Label Semantic Training Phase: In the pre-training phase, an event label semantic model is trained using training set samples to identify event types present in sentences and learn the semantics of event type labels. The event label semantic model sequentially performs global event type label semantic learning, local event type label semantic enhancement, information fusion, and event type decision on the input sample sequence.
[0052] Global event type label semantic learning is used to learn the semantics of sentence event type labels to obtain a global event type label semantic vector representation. Local event type label semantic enhancement adopts an attention mechanism, generating an attention weight matrix based on the contribution of different tokens in the input sequence to event type recognition, forming a local event type label semantic vector representation. Information fusion and event type decision are used to fuse the global event type label semantic vector representation with the local event type label semantic vector representation to make an event type decision.
[0053] Step S2, Event Extraction Stage: Input the text to be extracted into the trained event label semantic model to obtain sentence event semantic information. After fusing the text to be extracted with the sentence event semantic information, perform event type classification, identify the trigger words and corresponding event types, and then identify the arguments and argument role types.
[0054] like Figures 1 to 3 As shown, the detailed steps for implementing event extraction based on event semantic enhancement in this embodiment are as follows:
[0055] Step S1: Event type label semantic training
[0056] Event extraction involves training a language model to identify the token corresponding to the target event type label from a set of tokens; this token is the trigger word. Since the BERT model is a pre-trained language model trained in a general domain, it possesses rich semantic information and can be fine-tuned on a small amount of labeled data to meet the needs of different NLP tasks. Furthermore, it utilizes attention mechanisms to encode words and sentence structures in the context, fully leveraging contextual information to better understand the meaning of sentences and words. While the BERT model has strong universality, accurately identifying event-related trigger words from samples without knowledge of the target event type for the event extraction task is quite challenging. This embodiment uses BERT as the basic pre-trained model and adds a sentence-level event classification task to BERT to train an event type label semantic model containing event type label semantic information. This allows for analysis from the perspective of event type label semantics, forming an event extraction method based on sentence-level event classification task-enhanced event type label semantics, which can significantly improve the performance of the pre-trained model on specific tasks.
[0057] like Figure 2 As shown, the event type label semantic training phase in this embodiment specifically includes three parts: global event type label semantic learning, event type label semantic enhancement, and information fusion and event type decision-making, which are as follows:
[0058] (1) Semantic learning of global event type labels
[0059] In this embodiment, global event type label semantic learning treats the original event extraction task as a sentence event classification task. By identifying the event types contained in the sentence, the semantics of sentence event type labels are learned.
[0060] Specifically, the input sample sequence X = {x1, x2, ..., x...} n The input is fed into the BERT base model of the event type label semantic training model, and the global event type label semantic vector representation of the input sample sequence is obtained as follows:
[0061]
[0062] Where d is the hidden layer dimension of the BERT model, which is also the number of event type labels.
[0063] (2) Enhanced semantics of local event type labels
[0064] Since samples often contain a large amount of noise unrelated to the event, this embodiment employs an attention mechanism for local event type label semantic enhancement in the event type label semantic model to minimize the impact of noise on event type semantic learning. During local event type label semantic enhancement, information about event-related tokens in the input sample sequence is obtained. Based on the contribution of different tokens in the input sample sequence to event type recognition, the weights of different tokens in the input sequence are calculated to obtain an attention weight matrix. In the final attention weight matrix, event-related tokens receive higher attention scores, while event-unrelated tokens receive lower attention scores. The attention scores are weighted with the word vectors of the tokens in the input sample sequence to obtain the local event type label semantic vector representation of the sentence. This integrates the information of event-related tokens in the input sample sequence into the local event type label semantic vector representation, enabling the event type label to learn the semantics that represent the event type label as accurately as possible and minimizing the impact of noise on event type semantic learning.
[0065] Assuming the input sample sequence is X, the word vector representation of the input sample sequence is W = {w1, w2, ..., w...} n The event semantic vector representation E of the sentence is obtained by multiplying it by the weight matrix A of the text sequence and summing the weights. In this embodiment, the weight matrix A is optimized by calculating the weights of different tokens in the input sample sequence based on their contribution to event type recognition. This ensures that the event semantic vector representation E can learn as accurately as possible to represent the event type semantics contained in the input sample sequence.
[0066] Assuming that the trigger words in the input sample sequence X trigger k different event types, the vector representation of the local event type semantics of the input sample sequence X can be expressed as:
[0067]
[0068] in,
[0069]
[0070]
[0071] Ai represents the weight matrix for the i-th event type, q i It is the word embedding representation of the trigger word, k j Let E be the word embedding representation of the j-th token in the input sequence, and let E be the label of all event types in the dataset. i The set of , where d is the word embedding dimension of the BERT model.
[0072] (3) Information fusion and event type decision-making
[0073] The specific steps for information fusion and event type decision-making in the event tag semantic model of this embodiment include:
[0074] Step S101. Represent the global event type label semantic vector obtained from global event type label semantic learning and local event type label semantic enhancement respectively. Semantic vector representation of local event type labels The fusion process yields a vector representation of the event types contained in the current input sample sequence, l = {l1, l2, ..., l...}. d};
[0075] Step S102. Input l into the feed-forward network (FFN) layer, and use p i =FFN(l i Calculate the conditional probability distribution vector P = {p1, p2, ..., p...} of the input sample sequence belonging to different event type labels. m}, where m is the number of event type tags;
[0076] Step S103. Input the conditional probability distribution vector P of the different event type labels of the input samples into the event decision maker, and quantize all vectors to obtain P' = {p'1, p'2, ..., p' m} and make P'∈[0,1], then change each element p' in vector P'. i Compared with the first preset threshold t1 in the threshold decision maker, if the distribution vector p' of the i-th event type label is... iIf the probability is greater than the first preset threshold t1, the input sample is determined to trigger the i-th event type. After comparing the m probabilities, the event label Y' of the input sample sequence is finally obtained.
[0077] In a specific application embodiment, all vectors can be quantized according to the following formula (3) to obtain P' = {p'1, p'2, ..., p' m}, such that P'∈[0,1]. Then, according to p' i Compare with the threshold t in the threshold decision maker, if p' i If the value is greater than t, then the input sample triggers the i-th event type.
[0078]
[0079] Where s(x) represents the quantized value of the input quantized value x.
[0080] In a specific application embodiment, the model can be fine-tuned using the following formula (4):
[0081]
[0082] Where Loss(Y, Y') represents the loss between the predicted event type Y' and the actual event type Y for this sentence, y i This represents the actual result of whether the i-th event type has occurred.
[0083] This embodiment can obtain an event type label semantic model containing event type label semantic information through the above steps, which enables the identification of event types in sentences and the learning of event type label semantics. This allows for event extraction with enhanced event type label semantics based on sentence-level event classification tasks.
[0084] Step S2, Event Extraction
[0085] like Figure 3 As shown, the event extraction in this embodiment specifically includes five stages: sentence event semantic information fusion stage, event type classification stage, trigger word identification and event type decision stage, argument role identification, and argument role classification stage, which are as follows:
[0086] (1) Sentence event semantic information fusion stage
[0087] In this embodiment, the sentence event semantic information fusion stage is performed on the input text to be extracted, X = {x1, x2, ..., x...}. n The word vectors W = {w1, w2, ..., w} are given by the given word vectors. nThe sentence event semantic information of the input text is integrated into the sentence event semantic information, so that the sentence event semantic information includes the semantic information of multiple events that the sentence may involve.
[0088] The specific steps for fusing the text to be extracted with the semantic information of the sentence event include:
[0089] First, input the text X of length n to be extracted into the event type label semantic model obtained in step 1, and obtain the word vector representation W = {w1, w2, ..., w...} of all tokens in the text X to be extracted. n The sentence event semantic vector representation L = {l1, l2, ..., l} of the input sequence. d}, where w i ={w 1 i ,w 2 i ,...,w d i};
[0090] Next, the semantic vector representation L of the sentence event is fused with the word vector of each token in the input word vector W;
[0091] Finally, we obtain the word vector representation W'={w'1,w'2,...,w''} of the input sequence. n}
[0092] (2) Event type classification stage
[0093] In this embodiment, the event type classification stage is based on the word vector representation W'={w'1,w'2,...,w''} of all tokens in the input sequence to be extracted. n The process involves obtaining the score of each token in the sequence to be extracted across all events in the event type label set. Specific steps include:
[0094] The word vector representation W' is input into a feed-forward network (FFN) layer, using p i =FFN(w i Calculate the probability vector p of each token word vector belonging to different event type tags. i ={p 1 i ,p 2 i ,...,p m i}, where m is the number of event type tags;
[0095] Let P = {p1, p2, ..., p nThe input is fed into an activation function to quantize all vectors (e.g., using equation (1)), and the scores S = {s1, s2, ..., s...} are calculated to map the input sequence to all possible output event types. n}, where the event type score of token i is s i ={s 1 i ,s 2 i ,...,s m i}
[0096] (3) Trigger word recognition and event type decision stage
[0097] In this embodiment, the model is fine-tuned based on the trigger word identification, event type decision-making, and the score vectors of all tokens obtained in the event type classification stage for each event, so as to obtain the event type triggered by the span based on the identified trigger word span. The specific steps include:
[0098] First, fine-tune the model using the following formula to obtain the optimal model:
[0099]
[0100] Among them, y i j ∈[0,1],y i j ∈[0,1],y i j This indicates whether the i-th token triggers the j-th event type. If the i-th token triggers the j-th event type, then y i j If the value is 1, then the value is 0; otherwise, the value is 0.
[0101] Secondly, the event type scores S = {s1, s2, ..., s} of all tokens in the input sequence are calculated. n The input is fed into the event type decision-maker and compared with the second preset threshold t2 in the event type decision-maker. If the score s corresponding to the j-th event type is... j i If the event type is greater than the second preset threshold t2, then the i-th token is determined to trigger the j-th event type; after obtaining the event types of all tokens, consecutive tokens with the same predicted event type are considered as a span of a trigger word, thus obtaining the list of predicted trigger words l of the input sequence. t '=[(t s1 ,t e1 ,e1),...,(t sk ,t ek ,ek )).
[0102] (4) Argument Role Classification Stage
[0103] In the argument role classification stage of this embodiment, the word vector representation W'={w'1,w'2,...,w''} based on all tokens in the input sequence obtained in the sentence event semantic information fusion stage is used. n The process involves obtaining the score for each token in the input sequence across all argument roles in the argument role type label set. The specific steps include:
[0104] The word vector representation W' is input into two neural network layers: a head-feedforward neural network and a tail-feedforward neural network. Using p... i =FFN(w i Calculate the probability vector P of each token word vector belonging to the tag header position of different argument role types. si ={p 1 si ,p 2 si ,...,p M si} and tail position probability vector P ei ={p 1 ei ,p 2 ei ,...,p M ei}, where M is the number of argument character tags;
[0105] P si and P ei The input is fed into an activation function to quantize all vectors (e.g., using formula (3)) to calculate the head position score S of the input sequence mapped to all argument role types. s ={s s1 ,s s2 ,...,s sn Score for tail position e ={s e1 ,s e2 ,...,s en}, the head position score of the argument role of token i si ={s 1 si ,s 2 si ,...,s M si} and tail position score s ei ={s 1 ei ,s 2ei ,...,s M ei}
[0106] (5) Argument Role Identification and Classification Stage
[0107] In the argument role recognition and classification stage of this embodiment, the model is fine-tuned based on the score vectors of the head and tail positions of all tokens obtained in the argument role classification stage to identify the span of the argument. Based on the obtained trigger word prediction list, the model determines the role type of different spans in the event types triggered by different trigger word spans. Specific steps include:
[0108] First, the optimal model is obtained by fine-tuning the model according to the following formula:
[0109]
[0110]
[0111] Among them, y si j ∈[0,1],y si j This indicates whether the i-th token is the head position of the j-th argument role type. If the i-th token is the head position of the j-th argument role type, then y si j The value is 1 if the input sequence is a token and 0 otherwise. Ys and Ye represent whether all tokens in the input sequence are the head and tail positions of argument roles, respectively. ei j This indicates whether the i-th token is the starting position of the j-th argument; if so, it is 1, otherwise it is 0.
[0112] Then, the head position score S of all tokens in the input sequence is calculated. s ={s s1 ,s s2 ,...,s sn Score for tail position e ={s e1 ,s e2 ,...,s en The input is fed into the argument role type decision-maker and compared with the third preset threshold t3 in the argument role type decision-maker. If the score of the j-th head position is s... j si If the value is greater than the third preset threshold t3, then the i-th token is determined to be the head position of argument role type j; if the score of the j-th head position is s i sjIf the value is greater than the third preset threshold t3, then the j-th token is determined to be the tail position of argument role type i, until the list of argument roles corresponding to all event trigger words is obtained, where the list of predicted argument roles obtained from the i-th trigger is l. rti '=[(r s1 ,r e1 ,r1),...,(r sk ,r ek ,r k )).
[0113] This embodiment, through the above steps, can learn the semantics of a single event type label, the semantics of fusion of multiple event type labels, and the semantics of sentence event type based on the simultaneous occurrence of a single event and multiple events in different samples. When extracting multiple events from a single sentence, by fusing sentence event semantics instead of context semantics into token semantics, it improves the accuracy of token event semantic representation while reducing the impact of noisy token semantics in the context on the semantic representation of trigger words. This helps to determine the specific event type triggered by trigger words that can trigger multiple event types in the current environment, thereby improving the accuracy of single event recognition and the accuracy of multi-event recognition.
[0114] This invention adds a sentence event type classification task. First, an event type label semantic model is trained using training set samples. This model can correctly identify the event types that may exist in different sentences, thus completing the sentence event type classification task. During this process, the event semantic model learns the semantic information contained in the event type labels. Furthermore, an attention mechanism is used in the event label semantic model to enhance the semantics of local event type labels, enabling the event type labels to learn the semantics that represent them as accurately as possible, minimizing the impact of noise on the semantic learning process. Simultaneously, during the event type label semantic learning process, based on the occurrence of events in different samples, the semantics of individual event type labels, the fusion semantics of multiple event type labels, and the semantics of sentence event types are learned. When extracting events, sentence event semantics, rather than contextual semantics, is fused into the token semantics. This improves the accuracy of token event semantic representation while reducing the impact of noisy token semantics on the semantic representation of trigger words. This model can accurately identify the event type of polysemous trigger words, thus meeting the requirement for accurate event identification.
[0115] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the invention. Therefore, any simple modifications, equivalent changes, and alterations made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention should fall within the protection scope of the present invention.
Claims
1. An event extraction method based on event semantic enhancement, characterized in that the steps include... include: Step S1: Training the event type label semantic model: Use training set samples to train the event label semantic model to identify the event types present in the sentence and learn the semantics of the event type labels. The event label semantic model sequentially performs global event type label semantic learning, local event type label semantic enhancement, information fusion, and event type decision on the input sample sequence. The global event type label semantic learning is used to learn the semantics of sentence event type labels to obtain a global event type label semantic vector representation. The local event type label semantic enhancement adopts an attention mechanism, which generates an attention weight matrix based on the contribution of different tokens in the input sequence to event type recognition, forming a local event type label semantic vector representation. The information fusion and event type decision are used to fuse the global event type label semantic vector representation and the local event type label semantic vector representation to make an event type decision. Step S2, Event Extraction: The text to be extracted is input into the trained event label semantic model to obtain sentence event semantic information. The text to be extracted is fused with the sentence event semantic information and then the event type is classified. After identifying the trigger word and the corresponding event type, the argument and argument role type are identified. In step S1, the event tag semantic model performs information fusion and event type decision-making, including: Step S101. Merge the global event type label semantic vector representation and the local event type label semantic vector representation to obtain the event type vector representation L={ contained in the input sample sequence. l 1, l 2,..., l d }; Step S102. Represent the event type vector. l The input is fed into a feedforward neural network layer to calculate the conditional probability distribution vector P={ of the input sequence belonging to different event type labels. p 1, p 2,..., p m },in, m The number of event type tags; Step S103. Input the conditional probability distribution vector P of the different event type labels of the input sample into the event decision maker, and quantify all probabilities to obtain P'={ p’ 1, p’ 2,..., p’ m } and make P i '∈[0,1], let each element in vector P' p’ i And respectively with the first preset threshold in the threshold decision maker t 1. Compare, if the first i vectors p’ i Greater than the first preset threshold t 1. Then it is determined that the input sample triggers the first... i After comparing the event types, the event label Y' of the input sample sequence is finally obtained. In step S2, fusing the text to be extracted with the semantic information of the sentence event includes: fusing the text of length _____. n The text X to be extracted is input into the event tag semantic model to obtain the word vector representation W={ of all tokens in the sequence X to be extracted. w 1, w 2,..., w n The event type vector representation L={ and the sequence to be extracted X} l 1, l 2,..., l d },in w i ={ w 1 i , w 2 i ,..., w d i } , n For the number of tokens, d For the word embedding dimension of BERT, the word vectors of each token in the event type vector representation L and the word vector representation W are fused to obtain the fused word vector representation W' of the sequence to be extracted. w’ 1, w ’ 2,..., w’ n } 2. The event extraction method based on event semantic enhancement according to claim 1, characterized in that, In step S1, the event label semantic model obtains the global event type label semantic vector representation of the input sample by inputting the input sample sequence into the base model BERT, so as to realize the global event type label semantic learning of the input sample sequence.
3. The event extraction method based on event semantic enhancement according to claim 1, characterized in that, In step S1, the local event type label semantic enhancement performed by the event label semantic model includes: calculating the weights of different tokens in the input sample sequence to obtain an attention weight matrix based on their contribution to event type recognition, wherein tokens related to the event have higher attention scores and tokens unrelated to the event have lower attention scores; and weighting the attention weight matrix with the word vectors of the tokens in the input sequence to obtain the local event type label semantic vector representation of the input sample sequence, wherein when a trigger word in the input sample sequence X triggers... k When there are three different event types, the vector representation of the local event type semantics corresponding to the input sample sequence X is as follows: in, Ai Indicates the first i Weight matrix for each event type q i It is a word embedding representation of sentence event semantics. k j It is the first in the input sample sequence X j The word embedding representation of each token, E Label all event types in the dataset e i The set, x j Indicates the first element in the input sample sequence X. j One element, d This represents the word embedding dimension of the BERT model.
4. The event extraction method based on event semantic enhancement according to claim 1, characterized in that, Following step S103, the model is further fine-tuned according to the following formula: in, Loss ( Y , Y The ') represents the loss between the predicted event type Y' and the actual event type Y in this sentence. y i Indicates the first i The actual result of whether an event type has occurred.
5. The event extraction method based on event semantic enhancement according to any one of claims 1 to 4, characterized in that, In step S2, the event type classification includes: representing all word vectors of the sequence to be extracted, W'={ w’ 1, w ’ 2,..., w’ n The input is fed into the feedforward neural network layer. n Given the number of tokens, calculate the probability vector of each token's word vector belonging to different event type tags. p i ={ p 1 i , p 2 i ,..., p m i },in, m The number of event type labels; the probability vector P={ p 1, p 2,..., p n The input is fed into the activation function to quantize all vectors, and the score S = {} is calculated to map the sequence to be extracted to all possible output event types. s 1, s 2,..., s n }, obtain the score of each token in the sequence to be extracted on all events in the event type label set. s i ={ s 1 i , s 2 i ,..., s m i } 6. The event extraction method based on event semantic enhancement according to claim 5, characterized in that, In step S2, the step of identifying the trigger word and the corresponding event type includes: The optimal model is obtained by fine-tuning the current model according to the following formula: in, y i j ∈[0,1], y i j Indicates the first i Does the first token trigger the [number]th [test / test]? j The event type, if the first i The token triggers the first j Each event type, then y i j If it is 1; otherwise, it is 0. The event type score S of all tokens in the sequence to be extracted is calculated as follows: s 1, s 2,..., s n The input is fed into the event type decision maker, and compared with the second preset threshold in the event type decision maker. t 2. Compare, if the first j Score for each event type s j i Greater than the second preset threshold t 2, then determine the first i The token triggers the first j There are several event types; after obtaining the event types of all tokens, consecutive tokens with the same predicted event type are treated as a span of trigger words, thus obtaining a list of predicted trigger words for the input sequence. l t ’ =[( t s1 ,t e1 ,e 1),...,( t sk ,t ek ,e k )).
7. The event extraction method based on event semantic enhancement according to any one of claims 1 to 4, characterized in that, In step S2, the step of identifying arguments and argument role types includes: The word vector representations W' of the sequence to be extracted are input into the head-feedforward neural network and the tail-feedforward neural network to calculate the probability vector of each token word vector belonging to the tag head position of different argument role types. P si ={ p 1 si ,p 2 si ,...,p M si } and tail position probability vector P ei ={ p 1 ei ,p 2 ei ,...,p M ei },in M The number of argument character tags; Head position probability vector P si Tail position probability vector P ei The input is fed into the activation function to quantize all vectors, and the head position score S of the sequence to be extracted, mapped to all argument role types, is calculated. s ={ s s1 , s s2 ,..., s sn Score for tail position e ={ s e1 , s e2 ,..., s en }, where token i The head position score of the argument character s si ={ s 1 si ,s 2 si ,...,s M si Score for the tail position s ei ={ s 1 ei ,s 2 ei ,...,s M ei }; The optimal model is obtained by fine-tuning the model according to the following formula: in y si j ∈[0,1], y si j Indicates the first i Is the token the [number]th [item]? j The head position of each meta-role type, if the first i The first token j The head position of each argument character type, then y si j If it is 1; otherwise, it is 0. Ys , Ye These represent the head and tail positions of all tokens in the input sequence, respectively. y ei j Indicates the first i Is the _th token the _th ... j The starting position of each argument is 1 if it is 1, otherwise it is 0; Score the head position of all tokens in the sequence to be extracted. s ={ s s1 , s s2 ,..., s sn Score for tail position e ={ s e1 , s e2 ,..., s en The input is fed into the argument role type decision-maker, and compared with the third preset threshold in the argument role type decision-maker. t 3. Compare, if the first j Score by height s j si Greater than the third preset threshold t 3. Then determine the first i Each token is a meta-role type. j The head position; if the first j Score by height s i sj Greater than the third preset threshold t 3. Then determine the first j Each token is a meta-role type. i The last position is used until a list of argument roles corresponding to all event trigger words is obtained, where the first... i The list of predicted argument roles obtained from each trigger is as follows: l rti ’ =[( r s1 ,r e1 ,r 1),...,( r sk ,r ek ,r k )).
8. An event extraction system based on event semantic enhancement, comprising a processor and a memory, wherein the memory is used to store computer programs, characterized in that, The processor is used to execute the computer program to perform the method as described in any one of claims 1 to 7.