A network security event extraction method and system

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By collecting and processing cybersecurity incident data, defining incident types and argument roles, and utilizing pre-trained language models and information extraction frameworks, the problem of argument role sharing and overlap in cybersecurity incident extraction was solved, achieving higher extraction accuracy.

CN119628851BActive Publication Date: 2026-06-19CHINA TELECOM CLOUD TECH CO LTD

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA TELECOM CLOUD TECH CO LTD
Filing Date: 2024-06-28
Publication Date: 2026-06-19

Application Information

Patent Timeline

28 Jun 2024

Application

19 Jun 2026

Publication

CN119628851B

IPC: H04L9/40; G06F18/213; G06F18/22; G06F16/951; G06F16/953

AI Tagging

Application Domain

Web data indexing Securing communication

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A method and system for automatic crawling, analyzing and structuring of government-enterprise policy data
CN122197864AData processing applications Web data indexing
Directory generation method and apparatus, electronic device, and storage medium
CN116992109Bquick confirmationImprove production efficiencyWeb data indexing Natural language data processingTerm memoryWeb page
A method, system and application for retrieving and comparing information across multiple digital resources.
WO2026120604A1Web data indexing Special data processing applications
Enterprise knowledge intelligent management system and method for high security environment
CN122221966AWeb data indexing Semantic analysis
An engineering cost intelligent measurement and cost control system and method
CN122243598AFinance Web data indexing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing technologies, the event type and arguments are not effectively associated in network security event extraction tasks, resulting in problems such as shared and overlapping argument roles and multiple values for the same argument, which affects the accuracy of the extraction method.

Method used

By collecting data based on the Internet, defining a dataset and an event matching table, utilizing a pre-trained language model encoding layer module and an information extraction framework, identifying event trigger words and argument roles, establishing an event-argument embedding matrix, and optimizing the training objective function to solve the problem of argument role sharing and overlap.

Benefits of technology

It improves the accuracy of network security event extraction, effectively solves problems such as shared and overlapping argument roles and multiple values for the same argument, and improves the accuracy of the extraction method.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN119628851B_ABST

Patent Text Reader

Abstract

This invention relates to a method and system for extracting network security events, belonging to the technical field of network security event extraction. The method includes: defining event types of network security events based on a dataset and an event matching table; determining key information to be extracted from network security events based on the event types and argument roles of the information extraction framework. Determining key information for network security events based on their event types and argument roles facilitates deep association between event types and argument roles, ensuring the accuracy of the network security event extraction method. This effectively solves problems such as argument role sharing, argument role overlap, and multiple values corresponding to the same argument in network security event extraction tasks, thus guaranteeing the accuracy of the network security event extraction method.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of network security event extraction technology, and particularly relates to a method and system for extracting network security events. Background Technology

[0002] With the development of technology, a wealth of information about cybersecurity incidents is contained within internet text media. By employing event extraction techniques to extract specific core arguments of cybersecurity incidents from unstructured cybersecurity texts, effective data support can be provided for subsequent tasks such as network threat monitoring and network risk trend analysis. This helps security teams to promptly detect security incidents in the network, enabling them to quickly deploy defensive measures and mitigate the risks posed by cybersecurity threats.

[0003] In existing technologies, event type identification and event argument extraction are treated as separate tasks. Event types and arguments are not associated, and the dependency relationship between trigger words and event arguments is not considered. This can easily lead to problems such as argument role sharing in network security event extraction tasks, which affects the accuracy of network security event extraction methods. Summary of the Invention

[0004] In view of the shortcomings of the prior art, the purpose of this invention is to provide a method and system for extracting network security events. The method defines event types of network security events based on a dataset and an event matching table, and then uses an information extraction framework associated with these event types to determine the key information extracted from the network security events, based on the event types and argument roles of the information extraction framework. This allows for deep association between event types and argument roles, effectively solving problems such as shared argument roles, overlapping argument roles, and multiple values for the same argument in network security event extraction tasks, thus ensuring the accuracy of the network security event extraction method.

[0005] In a first aspect, the present invention proposes a method for extracting network security events, which is applied to network security event extraction scenarios.

[0006] The method for extracting network security incidents includes:

[0007] Data collected via the internet and related to cybersecurity incidents;

[0008] Based on data associated with cybersecurity incidents, corresponding role factors are extracted, and a dataset is defined based on the combination of multiple role factors;

[0009] Define the event type of a network security event based on the dataset and the event matching table;

[0010] The key information extracted from cybersecurity incidents is determined based on the incident type and the argument roles in the information extraction framework.

[0011] Furthermore, the data collected based on the Internet and associated with cybersecurity incidents includes:

[0012] Connect to the internet and web scraping tools;

[0013] Web crawling tools are used to crawl data on cybersecurity incidents on the Internet in order to collect data related to cybersecurity incidents;

[0014] Data preprocessing is triggered based on data associated with cybersecurity incidents. In this case, data preprocessing includes data filtering and text correction.

[0015] Furthermore, the step of extracting corresponding role factors based on data associated with cybersecurity incidents and defining a data set according to a combination of multiple role factors includes:

[0016] Obtain data associated with cybersecurity incidents;

[0017] Define the event role table;

[0018] Extract corresponding role factors based on data associated with cybersecurity incidents;

[0019] Collect multiple role factors and define a dataset based on the combination of these role factors.

[0020] Furthermore, the step of extracting corresponding role factors based on data associated with cybersecurity incidents and defining a data set according to a combination of multiple role factors also includes:

[0021] The event role table is specified based on data extracted from previous cybersecurity incidents.

[0022] Furthermore, defining the event type of a network security event based on the data set and the event matching table includes:

[0023] Collect data sets;

[0024] Related data sets and event matching tables;

[0025] Define the event type of a network security event based on the dataset and the event matching table.

[0026] Furthermore, the step of determining the key information extracted from a cybersecurity incident based on its event type and the argument roles within the information extraction framework includes:

[0027] Obtain input corpus from the event types of cybersecurity incidents;

[0028] Word vectors are generated based on the processing of the input corpus;

[0029] The word vectors, block vectors, and position vectors are used as inputs to the encoding layer modules of the pre-trained language model;

[0030] Decoding is performed based on the encoding layer module of a pre-trained language model, and event trigger words are identified.

[0031] The key information extracted from cybersecurity incidents is determined based on trigger words and argument roles in the information extraction framework.

[0032] Furthermore, the key information extracted from cybersecurity incidents based on the incident type and the argument roles in the information extraction framework also includes:

[0033] An event extraction framework is used to collect data, and the input corpus, argument clouds, and argument roles of network security events are associated with the event types input by the event extraction framework.

[0034] Furthermore, the step of determining the key information extracted from a cybersecurity incident based on its event type and the argument roles within the information extraction framework also includes:

[0035] In the encoding layer module of the pre-trained language model

[0036]

[0037] Where X i Let W represent the encoded representation of the i-th character in the input sequence, and W be a learnable weight matrix. start and W end It is a learnable weight matrix, b start and b end It is the bias, and σ is the sigmoid activation function. and These represent the probability that the i-th character is the starting position and the probability that it is the ending position of the trigger word, respectively.

[0038] Furthermore, the step of determining the key information extracted from a cybersecurity incident based on its event type and the argument roles within the information extraction framework also includes:

[0039] The information extraction framework optimizes the training objective function: based on a predefined set of event types T, a set of event argument roles R, and a sentence x, the overall learning objective is to predict all argument roles for each event type corresponding to the sentence. The maximum likelihood function is:

[0040]

[0041] Where T is the event type, tri is the trigger word corresponding to the event type, and R T It is an argument role under a specific event type, ε x This represents the event type, trigger word, and set of event arguments in sentence X;

[0042] CBSEE is a network security text event extraction dataset constructed in this invention; X is a single sentence in the dataset; α x It is the set of trigger words in sentence X; β x,tri It is the set of event types corresponding to the trigger word type tri; γ x,tri,T It is the set of event arguments corresponding to event type t and trigger word type tri; p(tri|x) is a trigger word decoder used to detect the trigger word type in the sentence, p(T|x,tri) is an event type extractor used to extract the event type of trigger word type tri, p(R T |x,tri,T) is an argument role extractor used to extract the argument role types corresponding to type T and trigger word tri.

[0043] A second aspect of the present invention provides a network security incident extraction system comprising:

[0044] The data acquisition module is used to collect data related to cybersecurity incidents via the Internet.

[0045] A dataset is used to extract corresponding role factors based on data associated with cybersecurity incidents, and to define the dataset based on the combination of multiple role factors;

[0046] The event type module is used to define the event type of a network security event based on the dataset and the event matching table;

[0047] Information models are used to determine the key information extracted from cybersecurity incidents based on the incident type and the argument roles of the information extraction framework.

[0048] The beneficial effects of this invention are as follows:

[0049] The method and system described in this invention are based on collecting data related to cybersecurity incidents via the Internet; extracting corresponding role factors from the data related to cybersecurity incidents, and defining a data set based on the combination of multiple role factors; defining the event type of the cybersecurity incident based on the data set and an event matching table; determining the key information extracted from the cybersecurity incident based on the event type of the cybersecurity incident and the argument roles of the information extraction framework. At this point, defining the event type of the cybersecurity incident based on the data set and the event matching table facilitates defining the event type of the cybersecurity incident, and associating the event type of the cybersecurity incident with the information extraction framework, thereby determining the key information extracted from the cybersecurity incident based on the event type of the cybersecurity incident and the argument roles of the information extraction framework, so as to deeply associate the event type and argument roles. This effectively solves the problems of argument role sharing, argument role overlap, and multiple values corresponding to the same argument in the cybersecurity incident extraction task, ensuring the accuracy of the cybersecurity incident extraction method. Attached Figure Description

[0050] The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Throughout the drawings, the same reference numerals denote the same parts. It is obvious that the drawings described below are merely some embodiments of the present invention, and those skilled in the art can obtain other drawings based on these drawings.

[0051] Figure 1 This is a flowchart illustrating the network security event extraction method in an embodiment of the present invention.

[0052] Figure 2 This is a flowchart illustrating step S11 of the network security event extraction method in this embodiment of the invention.

[0053] Figure 3 This is a flowchart illustrating step S12 of the network security event extraction method in this embodiment of the invention.

[0054] Figure 4 This is a flowchart illustrating step S13 of the network security event extraction method in this embodiment of the invention.

[0055] Figure 5 This is a flowchart illustrating step S14 of the network security event extraction method in this embodiment of the invention.

[0056] Figure 6 This is a partial schematic diagram of the network security event extraction method in an embodiment of the present invention;

[0057] Figure 7 This is a schematic diagram of the structural composition of the network security event extraction system in an embodiment of the present invention;

[0058] Figure 8 This is a hardware diagram of an electronic device according to an exemplary embodiment. Detailed Implementation

[0059] To enable those skilled in the art to better understand the technical solutions in the embodiments of the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. It should be understood that these descriptions are merely exemplary and are not intended to limit the scope of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0060] Furthermore, descriptions of well-known structures and techniques are omitted in the following description to avoid unnecessarily obscuring the concepts disclosed in this invention.

[0061] In the description of this invention, it should be noted that, unless otherwise explicitly specified and limited, the terms "center," "upper," "lower," "left," "right," "vertical," "horizontal," "inner," and "outer," etc., indicating orientation or positional relationships based on the orientation or positional relationships shown in the accompanying drawings, are only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of the invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and should not be construed as indicating or implying relative importance. The terms "installed," "connected," and "linked" should be interpreted broadly; for example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal communication of two components. Those skilled in the art can understand the specific meaning of the above terms in this invention based on the specific circumstances.

[0062] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of methods and systems consistent with some aspects of the invention as detailed in the appended claims.

[0063] This invention proposes a method and system for extracting network security events. It defines event types based on a dataset and an event matching table to facilitate event type definition. Then, based on the defined event types, it establishes an information extraction framework to determine the key information extracted from the network security events, considering both the event types and the argument roles within the extraction framework. This allows for deep association between event types and argument roles, effectively solving problems such as shared argument roles, overlapping argument roles, and multiple values for the same argument in network security event extraction tasks, thus ensuring the accuracy of the network security event extraction method.

[0064] Method Implementation Examples

[0065] Please see Figures 1 to 8 A method for extracting network security events, applied to network security event extraction scenarios; the method for extracting network security events includes:

[0066] Step S11: Collect data related to network security incidents via the Internet;

[0067] Step S12: Extract corresponding role factors based on data associated with cybersecurity incidents, and define a dataset based on the combination of multiple role factors;

[0068] Step S13: Define the event type of network security event based on the dataset and event matching table;

[0069] Step S14: Determine the key information to be extracted from the cybersecurity incident based on the incident type and the argument roles of the information extraction framework.

[0070] In this embodiment of the invention, the method defines the event type of a network security event based on a data set and an event matching table. This facilitates the definition of the event type and, based on the event type association information extraction framework, determines the key information extracted from the network security event for the event type and the argument roles of the information extraction framework. This allows for deep association between the event type and argument roles, effectively solving problems such as argument role sharing, argument role overlap, and multiple values corresponding to the same argument in the network security event extraction task, thus ensuring the accuracy of the network security event extraction method.

[0071] In step S11, data related to network security incidents is collected via the Internet;

[0072] In the specific implementation of this invention, the specific steps can be as follows:

[0073] S111: Connecting to the internet and web crawling tools;

[0074] S112: Using web crawler tools to crawl network security incident data on the Internet in order to collect data related to network security incidents;

[0075] S113: Data preprocessing is triggered based on data associated with a cybersecurity incident. In this case, data preprocessing includes data filtering and text correction.

[0076] In the specific implementation of this invention, the Internet and web crawling tools are used to crawl network security event data on the Internet to collect data related to network security events, thereby defining the data related to network security events and ensuring the control of data related to network security events.

[0077] Meanwhile, data preprocessing is triggered based on data associated with cybersecurity incidents. This preprocessing includes data filtering and text correction, ensuring the integrity and accuracy of the data related to cybersecurity incidents.

[0078] In step S12, corresponding role factors are extracted based on data associated with network security incidents, and a data set is defined based on the combination of multiple role factors;

[0079] In the specific implementation of this invention, the specific steps can be as follows:

[0080] S121: Obtain data associated with cybersecurity incidents;

[0081] S122: Define the event role table;

[0082] S123: Extract corresponding role factors based on data associated with cybersecurity incidents;

[0083] S124: Collect multiple role factors and define a dataset based on the combination of multiple role factors.

[0084] In the embodiments of this application, data associated with network security events is acquired and further processed to define an event role table, thereby introducing an event role table and defining various role factors and corresponding mapping relationships in the event role table.

[0085] At this point, corresponding role factors are extracted based on data associated with cybersecurity incidents. Multiple role factors are collected, and a dataset is defined based on the combination of these factors. This dataset presents the combination of multiple role factors in a dataset-like format, ensuring control over these factors and thus improving their overall structure. Simultaneously, the event role table is specified based on data extracted from previous cybersecurity incidents. The event role table is as follows:

[0086]

[0087] In step S13, the event type of the network security event is defined based on the data set and the event matching table;

[0088] In the specific implementation of this invention, the specific steps can be as follows:

[0089] S131: Data collection set;

[0090] S132: Related data set and event matching table;

[0091] S133: Define the event type of a network security event based on the dataset and the event matching table.

[0092] In embodiments of this application, a data set is collected to facilitate association between the data set and an event matching table. This allows for control based on the roles and mapping relationships in the event matching table, and further, the event types of network security events are defined according to the data set and the event matching table. The table defining the event types of network security events is as follows:

[0093]

[0094]

[0095] S14: Determine the key information extracted from cybersecurity incidents based on the incident type and the argument roles in the information extraction framework;

[0096] In the specific implementation of this invention, the specific steps can be as follows:

[0097] S141: Obtain input corpus from the event types of network security incidents;

[0098] S142: Word vectors are generated based on the processing of the input corpus;

[0099] S143: Use word vectors, block vectors, and position vectors as inputs to the encoding layer module of the pre-trained language model;

[0100] S144: Decode based on the encoding layer module of the pre-trained language model and identify the event trigger words;

[0101] S145: Determine the key information extracted from cybersecurity incidents based on trigger words and argument roles in the information extraction framework.

[0102] In the embodiments of this application, event types of network security events are introduced, and control is exercised over event types of network security events, thereby forming word vectors based on the processing of input corpus, so as to facilitate the introduction of word vectors.

[0103] At this point, word vectors, block vectors, and position vectors are used as inputs to the encoding layer module of the pre-trained language model. Decoding is performed based on the encoding layer module of the pre-trained language model, and event trigger words are identified. The key information extracted from the cybersecurity event is determined based on the trigger words and the argument roles of the information extraction framework, so as to deeply associate the event type and argument roles. This effectively solves the problems of argument role sharing, argument role overlap, and multiple values corresponding to the same argument in the cybersecurity event extraction task, and ensures the accuracy of the cybersecurity event extraction method.

[0104] At this point, an event extraction framework is collected, and the input corpus, argument cloud, and argument roles are associated with the event types of network security events based on the event extraction framework.

[0105] Furthermore, the event extraction task can be modeled as a function f that maps event types to event arguments. role_type(trigger) →Arguments, instead of treating event arguments as discrete labels on entity pairs.

[0106] Within the information extraction framework, the event extraction task consists of two steps:

[0107] (1) Identify all trigger words in the corpus.

[0108] (2) Based on each candidate trigger word, find the event type corresponding to the trigger word and extract the argument roles under the event type.

[0109] The information extraction framework designed in this invention consists of three modules: a pre-trained language model encoding layer module, a trigger word tagging module, and an argument role tagging module for specific types.

[0110] The pre-trained language model encoding layer module encodes the input corpus using the pre-trained language model, obtaining the hidden layer representation of each word. The trigger word tagging module extracts all potential trigger words from the input corpus. During decoding, this module uses two binary classifiers to predict the start and end index positions of the trigger words in the input corpus.

[0111] The calculation formula is shown below, where X iLet W represent the encoded representation of the i-th character in the input sequence, W be the learnable weight matrix, b be the bias, and σ be the sigmoid activation function. and and represent the probability that the i-th character is the start position and end position of the trigger word, respectively. In the input corpus "Yesterday, the Caesar website was attacked by a cyberattack, and hackers damaged the cloud service and stole data", "network" is marked as "start" and "attack" is marked as "end". The trigger word marking module identifies "network attack" as a trigger word.

[0112]

[0113] Where X i Let W represent the encoded representation of the i-th character in the input sequence, and W be a learnable weight matrix. start and W end It is a learnable weight matrix, b start and b end It is the bias, and σ is the sigmoid activation function. and These represent the probability that the i-th character is the starting position and the probability that it is the ending position of the trigger word, respectively.

[0114] Unlike the trigger word tagging module, the argument role tagging module for specific types considers not only the hidden layer vectors of the pre-trained language model encoder during the decoding process, but also the features of the currently identified trigger words and the current event type features.

[0115] This module first establishes an R*h event-argument embedding matrix to store all event-argument embeddings. The original input to this matrix is predefined cybersecurity event types and their corresponding role information. All event-argument embedding parameters are updated during training. R is the number of predefined event categories, and h is the dimension of the encoding layer of the pre-trained language model. Based on the event-argument embedding matrix, the corresponding event-argument embedding can be obtained through a specific event type ID. The calculation formula is as follows, where event_type... k This represents the event type with id k. It is the k-th vector in the event-argument embedding matrix.

[0116]

[0117] After establishing the event-argument embedding matrix, the argument role labeling module for specific types uses a binary classifier to predict the start and end index positions of various argument roles in the text.

[0118] The calculation formula is shown below.

[0119]

[0120] Where X i Let W represent the encoded representation of the i-th character in the input sequence, and W be a learnable weight matrix. start and W end It is a learnable weight matrix, b start and b end It is the bias, and σ is the sigmoid activation function. and These represent the probability that the i-th character is the starting position and the probability that it is the ending position of the trigger word, respectively.

[0121] The starting position of the time argument is "yesterday" and the ending position is "day". The starting position of the attack target argument is "Kaiser" and the ending position is "Station". The starting position of the impact argument is "Break" and the ending position is "Data". The event element information finally extracted through the framework of this invention is: [Event type: network attack, trigger word: network attack, argument list: [Time: yesterday, attack target: Caesar website, consequences: cloud service was damaged and data was stolen]].

[0122] The method of determining the key information extracted from a cybersecurity incident based on its event type and the argument roles in the information extraction framework also includes:

[0123] The information extraction framework optimizes the training objective function: based on a predefined set of event types T, a set of event argument roles R, and a sentence x, the overall learning objective is to predict all argument roles for each event type corresponding to the sentence. The maximum likelihood function is:

[0124]

[0125] Where T is the event type, tri is the trigger word corresponding to the event type, and R T It is an argument role under a specific event type, ε x This represents the event type, trigger word, and set of event arguments in sentence X;

[0126] CBSEE is a network security text event extraction dataset constructed in this invention; X is a single sentence in the dataset; α x It is the set of trigger words in sentence X; β x,tri It is the set of event types corresponding to the trigger word type tri; γ x,tri,T It is the set of event arguments corresponding to event type t and trigger word type tri; p(tri|x) is a trigger word decoder used to detect the trigger word type in the sentence, p(T|x,tri) is an event type extractor used to extract the event type of trigger word type tri, p(R T|x,tri,T) is an argument role extractor used to extract the argument role types corresponding to type T and trigger word tri.

[0127] At this point, in the task of network security event extraction, there are problems such as shared argument roles, overlapping argument roles, and multiple values corresponding to the same argument. To address these issues, this invention optimizes the training objective function of the proposed framework: based on a predefined set of event types T, a set of event argument roles R, and a sentence x, the overall learning objective is to predict all argument roles for the sentence under all event types.

[0128] To verify the effectiveness of the framework proposed in this invention, the inventors used the model structure designed in this invention, along with three other common event extraction models—PLMEE, Bert-CRF, and CasEE—to conduct modeling research on the event extraction task on the constructed cybersecurity text event extraction dataset CBSEE and the control dataset DuEE1.0. The experimental results of each model on the CBSEE dataset and the control dataset DuEE1.0 are shown in the following table.

[0129]

[0130] The experimental results show that the event extraction framework proposed in this invention uses a joint extraction method to alleviate the problem of error accumulation, and further improves the event extraction effect by introducing an event-argument embedding matrix.

[0131] The specific steps are as follows:

[0132] Step 1: Taking the input text as an example: First, the input text needs to be segmented using the Toknizer tokenizer of a pre-trained language model. Add [CLS] and [SEP] tags to the beginning and end of each segment, respectively. The input sequence after segmentation is: "[CLS]"

[0133] After preprocessing the input corpus, [SEP] needs to convert each segmented character into a token embedding of the pre-trained language model, and add this token embedding to the segment embedding and position embedding as input to the encoding layer module of the pre-trained language model.

[0134] Step 2: During the decoding process, the trigger word tagging module uses two binary classifiers to predict the start and end index positions of potential trigger words in the input corpus. It then uses 0 and 1 to mark the prediction results of each token as start and end. If the token is marked as 1, it indicates that the token is the start or end position of the trigger word. In the example input text of this embodiment, the trigger word tagging module marks the token as start and the token as end, and the token is identified as a trigger word.

[0135] Step 3: Based on the identification of event trigger words,

[0136] The event extraction framework of this invention adds the initial hidden vector of the example text in the pre-trained language model encoder, the corresponding vector of the currently identified trigger word in the pre-trained language model encoder, and the vector corresponding to the attack event in the event-argument embedding matrix as the input to the argument role labeling module under a specific type.

[0137] The argument role labeling module for specific types uses multiple binary classifiers to predict the start and end positions of different argument roles.

[0138] Based on the output and Determine whether the i-th character is the start or end position of the j-th argument role.

[0139] Finally, in the example input text of this embodiment, the argument role marking module under specific types will identify 8 as the start of a time argument, day as the end of a time argument, Dan as the start of a victim argument, CloudNordic as the end of a victim argument, Le as the start of an attack tool argument, attack as the end of an attack tool argument, Le as the start of an attack mode argument, secret as the end of an attack mode argument, what is identified as the start of an argument causing impact, and " as the end of an argument causing impact.

[0140] The event element information finally extracted through the framework of this invention is: [Event type: attack, trigger word: software, argument list: [Time: August 28, attack tool: software, attack mode: software encryption, victim: CloudNordic, a large Danish cloud service provider]].

[0141] In this embodiment of the invention, the method defines the event type of a network security event based on a data set and an event matching table. This facilitates the definition of the event type and, based on the event type association information extraction framework, determines the key information extracted from the network security event for the event type and the argument roles of the information extraction framework. This allows for deep association between the event type and argument roles, effectively solving problems such as argument role sharing, argument role overlap, and multiple values corresponding to the same argument in the network security event extraction task, thus ensuring the accuracy of the network security event extraction method.

[0142] System Implementation Examples

[0143] Please see Figure 7 , Figure 7 This is a schematic diagram of the structural composition of the network security event extraction system in an embodiment of the present invention.

[0144] like Figure 7 As shown, a network security event extraction system includes:

[0145] Data collection module 21 is used to collect data related to network security incidents via the Internet;

[0146] Data set 22 is used to extract corresponding role factors based on data associated with cybersecurity incidents, and to define the data set according to the combination of multiple role factors;

[0147] Event Type Module 23 is used to define the event type of a network security event based on the dataset and the event matching table;

[0148] Information Model 24 is used to determine the key information extracted from cybersecurity events based on the event type of the cybersecurity event and the argument roles of the information extraction framework.

[0149] Please see Figure 8 See below for reference. Figure 8 To describe an electronic device 40 according to this embodiment of the present invention. Figure 8 The electronic device 40 shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present invention.

[0150] like Figure 8 As shown, the electronic device 40 is manifested in the form of a general-purpose computing device. The components of the electronic device 40 may include, but are not limited to: at least one processing unit 41, at least one storage unit 42, and a bus 43 connecting different system components (including storage unit 42 and processing unit 41).

[0151] The storage unit stores program code, which can be executed by the processing unit 41 to perform the steps described in the "Embodiment Methods" section of this specification according to various exemplary embodiments of the present invention.

[0152] Storage unit 42 may include a readable medium in the form of a volatile storage unit, such as random access memory (RAM) 421 and / or cache memory 422, and may further include a read-only memory (ROM) 423.

[0153] Storage unit 42 may also include a program / utility 424 having a set (at least one) of program modules 425, including but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of these examples may include an implementation of a network environment.

[0154] Bus 43 can represent one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local bus using any of the multiple bus structures.

[0155] Electronic device 40 can also communicate with one or more external devices (e.g., keyboard, pointing device, Bluetooth device, etc.), and with one or more devices that enable a user to interact with electronic device 40, and / or with any device that enables electronic device 40 to communicate with one or more other computing devices (e.g., router, modem, etc.). This communication can be performed through input / output (I / O) interface 44. Furthermore, electronic device 40 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) through network adapter 45. Figure 8 As shown, network adapter 45 communicates with other modules of electronic device 40 via bus 43. It should be understood that, although... Figure 8 As not shown, other hardware and / or software modules may be used in conjunction with electronic device 40, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup planning systems.

[0156] From the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of this disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) or on a network, including several instructions to cause a computing device (such as a personal computer, server, terminal device, or network device, etc.) to execute the methods according to the embodiments of this disclosure.

[0157] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing related hardware. This program can be stored in a computer-readable storage medium, which may include: read-only memory (ROM), random access memory (RAM), a magnetic disk, or an optical disk, etc. Furthermore, it stores computer program instructions, which, when executed by a computer, cause the computer to perform the methods described above.

[0158] Furthermore, the above description of the network security event extraction method and system provided by the embodiments of the present invention has been detailed. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A method of extracting a network security event, characterized by, Applications include scenarios involving the extraction of network security incidents; The method for extracting network security incidents includes: Data collected via the internet and related to cybersecurity incidents; Based on data associated with cybersecurity incidents, corresponding role factors are extracted, and a dataset is defined based on the combination of multiple role factors; Define the event type of a network security event based on the dataset and the event matching table; The key information extracted from cybersecurity incidents is determined based on the incident type and argument roles in the information extraction framework. This process includes: acquiring input corpus within the incident type; generating word vectors based on the input corpus; using word vectors, block vectors, and position vectors as input to the encoding layer module of a pre-trained language model; decoding the pre-trained language model's encoding layer module and identifying the event trigger words; determining the key information extracted from the cybersecurity incident based on the trigger words and argument roles in the information extraction framework; collecting the event extraction framework and associating the input corpus, arguments, and argument roles for the incident type of the cybersecurity incident based on the event extraction framework. The method of determining the key information extracted from a cybersecurity incident based on its event type and the argument roles in the information extraction framework also includes: In the encoding layer module of the pre-trained language model in This represents the encoded representation of the i-th character in the input sequence. It is a learnable weight matrix. and It is a learnable weight matrix. and It's a bias. It is the sigmoid activation function. and These represent the probability that the i-th character is the starting position and the probability that it is the ending position of the trigger word, respectively. The information extraction framework optimizes the training objective function: based on a predefined set of event types T, a set of event argument roles R, and a sentence X, the overall learning objective is to predict all argument roles for each event type corresponding to the sentence. The maximum likelihood function is: wherein, is an event type, is a trigger word corresponding to the event type, is an argument role under the specific event type, denotes the event type, trigger word, and event argument set in the sentence X; CBSEE is a constructed dataset for extracting cybersecurity text events; X is a single sentence in the dataset; It is the set of trigger words in sentence X; It is the set of event types corresponding to the trigger word type tri; It is the set of event arguments corresponding to event type t and trigger word type tri; It is a trigger word decoder used to detect the types of trigger words in a sentence. It is an event type extractor used to extract the event type of the trigger word type tri. It is an argument role extractor used to extract the argument role types corresponding to type T and trigger word tri; The argument roles include: time, attack tool, attacker, vulnerability type, victim, propagation method, data type, impact, attack motive, persistence, and attack pattern; the event types include: ransomware attack, data breach, phishing, patch vulnerability, disclosure vulnerability, malware attack, cyber attack, IoT attack, and social engineering attack.

2. The method for extracting network security events according to claim 1, characterized in that, The data collected based on the Internet and associated with cybersecurity incidents includes: Connect to the internet and web scraping tools; Web crawling tools are used to crawl data on cybersecurity incidents on the Internet in order to collect data related to cybersecurity incidents; Data preprocessing is triggered based on data associated with cybersecurity incidents. In this case, data preprocessing includes data filtering and text correction.

3. The network security event extraction method of claim 1, wherein, The process involves extracting corresponding role factors based on data associated with cybersecurity incidents, and defining a data set based on combinations of multiple role factors, including: Obtain data associated with cybersecurity incidents; Define the event role table; Extract corresponding role factors based on data associated with cybersecurity incidents; Collect multiple role factors and define a dataset based on the combination of these role factors.

4. The network security event extraction method of claim 3, wherein, The method of extracting corresponding role factors based on data associated with cybersecurity incidents and defining a data set based on combinations of multiple role factors also includes: The event role table is specified based on data extracted from previous cybersecurity incidents.

5. The method of claim 1, wherein, The event types defined based on the data set and event matching table include: Collect data sets; Related data sets and event matching tables; Define the event type of a network security event based on the dataset and the event matching table.

6. A system for extracting network security events, the system comprising: The network security event extraction system is applied to the network security event extraction method as described in any one of claims 1-5, and the network security event extraction system includes: The data acquisition module is used to collect data related to cybersecurity incidents via the Internet. A dataset is used to extract corresponding role factors based on data associated with cybersecurity incidents, and to define the dataset based on the combination of multiple role factors; The event type module is used to define the event type of a network security event based on the dataset and the event matching table; An information model is used to determine the key information extracted from cybersecurity events based on the event type of the cybersecurity event and the argument roles of the information extraction framework. This includes: acquiring input corpus within the event type of the cybersecurity event; generating word vectors based on the processed input corpus; using word vectors, block vectors, and position vectors as input to the encoding layer module of a pre-trained language model; decoding based on the encoding layer module of the pre-trained language model and identifying event trigger words; determining the key information extracted from the cybersecurity event based on the trigger words and the argument roles of the information extraction framework; collecting the event extraction framework and associating the input corpus, arguments, and argument roles of the event type of the cybersecurity event based on the event extraction framework.