Network intrusion detection system based on large language model and parameter efficient fine-tuning

By using a network intrusion detection system based on a large language model and efficient parameter fine-tuning, the problems of insufficient detection capability for unknown attacks and high consumption of computing resources in traditional systems are solved. It achieves high-precision network attack detection and complex context parsing with low false alarm rate, and is adaptable to various network environments.

CN122247679APending Publication Date: 2026-06-19黄昌龙

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
黄昌龙
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional network intrusion detection systems have limited ability to detect unknown attacks when processing high-dimensional network traffic data, resulting in a high false positive rate and a lack of semantic understanding of complex attack contexts. At the same time, existing deep learning solutions have high computational overhead and are difficult to deploy efficiently in resource-constrained environments.

Method used

A network intrusion detection system based on a large language model and efficient parameter fine-tuning is adopted, including data preprocessing, feature selection, text representation, large language model analysis and classification response modules. The Llama3 model is trained by a parameter-efficient fine-tuning method combining QLoRA and PEFT to achieve accurate identification of network attacks and low computational resource consumption.

🎯Benefits of technology

It achieves high-precision, low-false-positive-rate network attack detection, can parse complex attack contexts, and can be efficiently deployed in resource-constrained environments, adapting to various network environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247679A_ABST
    Figure CN122247679A_ABST
Patent Text Reader

Abstract

This invention relates to the field of network security technology and discloses a network intrusion detection system based on a large language model and efficient parameter fine-tuning. The system includes a data preprocessing and feature selection module, a text representation module, a large language model analysis engine, and a classification and response module. The data preprocessing and feature selection module collects network data and preprocesses it to obtain preprocessed data. This network intrusion detection system based on a large language model and efficient parameter fine-tuning converts structured network data into text data in natural language instruction format. Leveraging the strong semantic understanding capabilities of the Llama3 large language model, it achieves deep analysis of the network attack context, accurately identifying complex and covert network threats such as zero-day attacks and multi-step attacks. Furthermore, it incorporates a feature selection strategy based on Pearson correlation coefficients to improve data effectiveness and significantly reduce the false positive rate.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of network security technology, specifically to a network intrusion detection system based on a large language model and efficient parameter fine-tuning. Background Technology

[0002] As a core component of the network security protection system, possessing both real-time monitoring and proactive early warning capabilities, the network intrusion detection system is a crucial line of defense for safeguarding cyberspace security and resisting malicious network attacks. It is responsible for collecting and deeply analyzing various traffic data and system operation logs in network links around the clock, accurately identifying various malicious network behaviors such as scanning and probing, malicious intrusion, and traffic attacks, and promptly discovering security vulnerabilities and abnormal risks in the network system. This provides key evidence for the formulation of network security protection strategies and the rapid handling of attack behaviors, ultimately playing a critical role in ensuring the stable operation of various network infrastructures, business systems, and the security of data assets.

[0003] However, traditional Network Intrusion Detection Systems (NIDS) have several limitations in handling high-dimensional network traffic data and adapting to new attack patterns: First, they rely on predefined rules or traditional machine learning models, resulting in limited detection capabilities for unknown attacks (such as zero-day attacks and hybrid DDoS attacks); second, they have a high false positive rate in high-dimensional feature environments and lack the ability to understand the semantic context of complex attacks; and third, while existing deep learning-based solutions have some generalization capabilities, they have high computational overhead and are difficult to deploy efficiently in resource-constrained environments. Summary of the Invention

[0004] (a) Technical problems to be solved To address the shortcomings of existing technologies, this invention provides a network intrusion detection system based on a large language model and efficient parameter fine-tuning. It has the advantages of high detection accuracy, low false alarm rate, low computational resource consumption, strong generalization ability, and accurate parsing of complex attack contexts. It solves the problems of traditional network intrusion detection systems having limited detection capabilities for unknown attacks such as zero-day attacks and hybrid DDoS attacks, high false alarm rate in high-dimensional feature environments and lack of semantic understanding of complex attack contexts, as well as the problems of existing deep learning detection schemes having high computational overhead and being difficult to deploy efficiently in resource-constrained environments.

[0005] (II) Technical Solution To achieve the aforementioned goals of high detection accuracy, low false alarm rate, low computational resource consumption, strong generalization ability, and accurate parsing of complex attack contexts, this invention provides the following technical solution: a network intrusion detection system based on a large language model and efficient parameter fine-tuning, comprising a data preprocessing and feature selection module, a text representation module, a large language model analysis engine, and a classification and response module: The data preprocessing and feature selection module is used to collect network data, preprocess the network data to obtain preprocessed data, and use Pearson correlation coefficient to filter features of the preprocessed data, remove redundant features, and obtain a target feature set. The text representation module, connected to the data preprocessing and feature selection module, is used to convert the target feature set into text data in natural language instruction format, wherein the text data includes instruction-response pairs; The large language model analysis engine, connected to the text representation module, includes the Llama3 base model. The Llama3 base model is fine-tuned and trained using a parameter-efficient fine-tuning method combining QLoRA and PEFT to obtain an object detection model. The object detection model is used to analyze the text data and output the classification results of the network data, which include normal network data and abnormal network data. The classification and response module is connected to the large language model analysis engine and is used to perform an alarm operation if the classification result is abnormal network data, and to perform a traffic allow operation if the classification result is normal network data.

[0006] Preferably, the network data includes at least one of network traffic data and system log data; the preprocessing includes at least one of data cleaning and data normalization, wherein data cleaning is to remove missing values, outliers and duplicate values, and data normalization is to map feature values ​​to the [0,1] interval to eliminate the influence of units.

[0007] Preferably, the text representation module converts the target feature set into text data in natural language instruction format. Specifically, it converts the feature-value pairs in the target feature set into natural language description statements according to a preset instruction template. The instruction is a statement used to inquire whether network data has been intruded and the type of intrusion, and the answer is a statement used to label the network data status and specific attack type.

[0008] Preferably, the method of using a combination of QLoRA and PEFT to efficiently fine-tune the parameters of the Llama3 base model includes: Construct a training dataset, which includes multiple network intrusion sample data that have been textualized; Based on the QLoRA technique, a low-rank matrix is ​​injected into the attention layer and fully connected layer of the Llama3 base model. Only the parameters of the low-rank matrix are trained and updated without changing the original parameters of the base model. By combining PEFT technology and using an optimizer to iteratively optimize and train the updated low-rank matrix parameters, a target detection model with network intrusion detection capabilities is obtained.

[0009] Preferably, the training dataset covers sample data of at least one network attack type, including DDoS attacks, U2R attacks, and R2L attacks, and the training dataset is derived from at least one publicly available network intrusion detection dataset, including CIC-DDoS2019, CIC-IDS2017, CICIoT2023, and InSDN.

[0010] Preferably, when the classification and response module performs the alarm operation, it is specifically used to send alarm information to the network security management platform. The alarm information includes the network attack type, the attack source IP address, and the attack time. When the classification and response module detects abnormal data of DDoS attack type, it can simultaneously perform network traffic blocking operation.

[0011] The operation steps of the network intrusion detection method based on large language models and efficient parameter fine-tuning are as follows: S1: Collect network data, preprocess the network data to obtain preprocessed data, and use Pearson correlation coefficient to filter features of the preprocessed data, remove redundant features, and obtain target feature set; S2: Convert the target feature set into text data in natural language instruction format; S3: Input the text data into the target detection model, analyze the text data through the target detection model, and output the classification result of the network data; S4: Based on the classification results, execute the corresponding alarm or traffic blocking operation.

[0012] Preferably, when using the Pearson correlation coefficient for feature selection in step S1, the correlation coefficient threshold is set to 0.9. If the Pearson correlation coefficient between two features is greater than 0.9, then one of the redundant features is removed.

[0013] Preferably, in step S3, the analysis results of the target detection model on the text data, in addition to the basic classification of normal network data and abnormal network data, also include the specific network attack type corresponding to the abnormal network data, specifically covering at least one of DDoS attack, U2R attack, and R2L attack.

[0014] (III) Beneficial Effects Compared with existing technologies, this invention provides a network intrusion detection system based on a large language model and efficient parameter fine-tuning, which has the following advantages: 1. This network intrusion detection system based on large language models and efficient parameter fine-tuning converts structured network data into text data in natural language instruction format. Relying on the strong semantic understanding capabilities of the Llama3 large language model, it achieves in-depth analysis of network attack context, accurately identifying complex and covert network threats such as zero-day attacks and multi-step attacks. At the same time, it combines a feature screening strategy based on Pearson correlation coefficient to improve data effectiveness and significantly reduce the false alarm rate.

[0015] 2. This network intrusion detection system based on a large language model and efficient parameter fine-tuning uses a parameter-efficient fine-tuning method combining QLoRA and PEFT to train the Llama3 model. It only updates the parameters of the injected low-rank matrix without changing the basic parameters of the model, which greatly reduces the scale of training parameters and the consumption of computing resources, and significantly shortens the model training time. At the same time, the modular architecture design is adaptable to various network environments such as IoT, SDN, and traditional IT, allowing the system to be deployed efficiently in resource-constrained devices, combining computational efficiency and scenario generalization ability. Attached Figure Description

[0016] Figure 1 This is a schematic block diagram of the system structure in this invention; Figure 2 This is a diagram of the overall system architecture in this invention; Figure 3 This is a schematic diagram of the text representation algorithm in this invention; Figure 4 This is a graph showing the experimental data of the present invention. Detailed Implementation

[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0018] Please see Figure 1-4 A network intrusion detection system based on a large language model and efficient parameter fine-tuning includes a data preprocessing and feature selection module, a text representation module, a large language model analysis engine, and a classification and response module. The data preprocessing and feature selection module is used to collect network data, preprocess the network data to obtain preprocessed data, and use Pearson correlation coefficient to filter features of the preprocessed data, remove redundant features, and obtain the target feature set. The text representation module, connected to the data preprocessing and feature selection module, is used to convert the target feature set into text data in natural language instruction format, which includes instruction-response pairs. The large language model analysis engine, connected to the text representation module, includes the Llama3 base model. The Llama3 base model is fine-tuned using a parameter-efficient fine-tuning method combining QLoRA and PEFT to obtain the object detection model. This object detection model analyzes text data and outputs classification results for the network data, including normal and abnormal network data. The classification and response module connects to the large language model analysis engine. Based on the classification results, it performs an alarm operation if the classification result is abnormal network data, and performs a traffic allow operation if the classification result is normal network data. The modules communicate with each other via data interaction interfaces to achieve end-to-end streaming data transmission, ensuring real-time performance of network data from acquisition to detection response. Specifically, the output of the data preprocessing and feature selection module communicates bidirectionally with the input of the text representation module through a data interface. The output of the text representation module is connected to the input layer of the large language model analysis engine. The output layer of the large language model analysis engine establishes a data transmission link with the input of the classification and response module. Each module performs operations sequentially according to the data processing timeline, forming a closed-loop intrusion detection process.

[0019] In the implementation of the case, network data includes at least one of network traffic data and system log data; preprocessing includes at least one of data cleaning and data normalization. Data cleaning is to remove missing values, outliers and duplicate values, and data normalization is to map feature values ​​to the [0,1] interval to eliminate the influence of units. Differentiated preprocessing strategies are adopted for different types of network data. Numerical features and categorical features are subjected to appropriate cleaning and normalization operations respectively. Specifically, for numerical features such as the number of data packets, the number of bytes, and the duration in network traffic data, outliers are identified and removed using the 3σ criterion, and missing values ​​are filled with the mean. For categorical features such as transmission protocols and network connection status, missing values ​​are filled with the mode and duplicate records are deleted. All numerical features are uniformly mapped to the [0,1] interval using the minimum-maximum normalization formula, while categorical features retain their original feature attributes without normalization.

[0020] In the implementation of the case, the text representation module converts the target feature set into text data in the format of natural language instructions. Specifically, according to the preset instruction template, the feature-value pairs in the target feature set are converted into natural language description statements. The instructions are statements used to ask whether network data has been intruded and the type of intrusion. The answers are statements used to mark the status of network data and the specific attack type. The preset instruction template uses the Llama3 model-adapted [INST] instruction header and [ / INST] instruction tail for encapsulation, forming a standardized large language model input format. Specifically, the instruction part starts with [INST] and is expressed as "The network flow has [feature 1] of [feature value 1], [feature 2] of [feature value 2],...Is this network dataflow associated with any network attacks?[ / INST]", and the answer part is directly expressed as "This data flow can be categorized as [normal / Benign / specific attack type]", completing the conversion of feature-value pairs to natural language instruction-answer pairs.

[0021] In the implementation case, a parameter-efficient fine-tuning method combining QLoRA and PEFT was used to fine-tune the Llama3 base model, including: Construct a training dataset, which includes multiple network intrusion sample data represented by text. Based on the QLoRA technique, low-rank matrices are injected into the attention layer and fully connected layer of the Llama3 base model. Only the parameters of the low-rank matrices are trained and updated without changing the original parameters of the base model. By combining PEFT technology and using an optimizer to iteratively optimize and train the updated low-rank matrix parameters, a target detection model with network intrusion detection capabilities is obtained. In the fine-tuning training process, the Llama3 model is quantized, and reasonable low-rank matrix rank and optimizer hyperparameters are set to balance model training efficiency and detection performance. Specifically, the Llama3 base model is first quantized to 4-bit precision to reduce memory usage. A low-rank matrix with a rank of 8 is injected into the model's attention layer and fully connected layer. AdamW is selected as the optimizer, with a learning rate of 2e-4, 50 training epochs, and a batch size of 32. During training, only the parameters of the low-rank matrix are updated, while the parameters of the base model are frozen and do not participate in training. After training, the low-rank matrix parameters are fused with the base model to obtain the object detection model.

[0022] In the case implementation, the training dataset covers sample data of at least one network attack type among DDoS attacks, U2R attacks, and R2L attacks, and the training dataset comes from at least one public network intrusion detection dataset among CIC-DDoS2019, CIC-IDS2017, CICIoT2023, and InSDN. The process involves preprocessing and textualizing the publicly available datasets, then dividing them into training, validation, and test sets according to a predefined ratio to ensure the effectiveness and generalization of the model training. Specifically, core network flow features are extracted from each publicly available dataset and cleaned, filtered, and converted into text. The processed sample data is then divided into training, validation, and test sets in an 8:1:1 ratio. The training set is used for updating model parameters, the validation set is used to adjust hyperparameters and prevent overfitting during training, and the test set is used to finally verify the detection accuracy and generalization ability of the target detection model.

[0023] In the implementation of the case, when the classification and response module performs alarm operations, it is specifically used to send alarm information to the network security management platform. The alarm information includes the network attack type, the attack source IP address, and the attack time. When the classification and response module detects abnormal data of DDoS attack type, it can simultaneously perform network traffic blocking operations. The classification and response module incorporates an attack type matching library and a network control interface, enabling standardized push notifications of alarm information and rapid traffic blocking. Specifically, the attack type matching library works in conjunction with the output of the target detection model to accurately identify specific attack types and extract core information such as the attack source IP address and attack occurrence time. Standardized alarm information is then pushed to the network security management platform via the TCP / IP protocol. Simultaneously, by calling the control interface of network devices, the module performs port blocking or traffic rate limiting operations on detected DDoS attack traffic based on the attack source IP address, quickly curbing the attack.

[0024] The operation steps of the network intrusion detection method based on large language models and efficient parameter fine-tuning are as follows: S1: Collect network data, preprocess the network data to obtain preprocessed data, and use Pearson correlation coefficient to filter features of the preprocessed data, remove redundant features, and obtain the target feature set; S2: Convert the target feature set into text data in natural language instruction format; S3: Input the text data into the object detection model, analyze the text data through the object detection model, and output the classification results of the network data; S4: Based on the classification results, execute the corresponding alarm or traffic blocking operation.

[0025] In the implementation of the case, when using the Pearson correlation coefficient for feature selection in step S1, the correlation coefficient threshold is set to 0.9. If the Pearson correlation coefficient between two features is greater than 0.9, one of the redundant features is removed. Specifically, a feature correlation matrix is ​​constructed by calculating the Pearson correlation coefficient between features. Based on the matrix, batch identification and removal of highly redundant features are achieved. Specifically, the Pearson correlation coefficient is calculated for each pair of preprocessed features to form an n×n feature correlation matrix (n is the number of features). All correlation coefficient values ​​in the matrix are traversed. If the absolute value of the correlation coefficient between any two features is greater than 0.9, the feature with the lower information contribution rate is removed. This operation is repeated until there are no highly redundant features. The remaining features constitute the target feature set.

[0026] In the case implementation, the analysis results of the target detection model on the text data in step S3, in addition to the basic classification of normal network data and abnormal network data, also include the specific network attack type corresponding to the abnormal network data, specifically covering at least one of DDoS attack, U2R attack, and R2L attack. The object detection model uses Llama3's semantic understanding capabilities to extract features and parse context from text data. By fine-tuning the sample annotation information during training, it achieves accurate classification of specific attack types. Specifically, the object detection model segments and encodes the input natural language command text, then captures the correlation between network data features through the model's attention mechanism. Combined with the network attack feature patterns learned during fine-tuning training, it first determines whether the network data is normal or abnormal. If it is abnormal, it further matches specific attack type labels and finally outputs the classification result of "normal / Benign" or "abnormal + specific attack type".

[0027] In the case implementation, let's take the detection of DDoS attacks as an example: 1. Extract network flow features from the CIC-DDoS2019 dataset; 2. Use Pearson correlation coefficient (threshold = 0.9) to filter features and remove highly redundant features; The filtered features are converted into the following instruction format using a text representation algorithm: text [INST] The network flow has packet_count of 15, byte_count of 1200...Is this network data flow associated with any DDoS attacks? [ / INST] 3. Input the command into the finely tuned Llama3 model, and the model outputs a classification result such as "This data flow can be categorized as DDoS"; 4. The system triggers corresponding response mechanisms based on the output, such as alarms or traffic blocking.

[0028] In summary, this network intrusion detection system based on a large language model and efficient parameter fine-tuning converts structured network data into text data in natural language instruction format. Leveraging the strong semantic understanding capabilities of the Llama3 large language model, it achieves deep analysis of network attack context, accurately identifying complex and covert network threats such as zero-day attacks and multi-step attacks. Furthermore, by incorporating a feature selection strategy based on Pearson correlation coefficients, it enhances data effectiveness and significantly reduces the false positive rate. This system addresses the shortcomings of traditional network intrusion detection systems that rely on predefined rules or traditional machine learning models, resulting in insufficient detection capabilities for unknown attacks, high false positive rates in high-dimensional feature environments, and a lack of semantic understanding of complex attack contexts.

[0029] Furthermore, the Llama3 model is trained using a parameter-efficient fine-tuning method combining QLoRA and PEFT. This method updates only the injected low-rank matrix parameters without altering the model's fundamental parameters, significantly reducing the training parameter scale and computational resource consumption, and substantially shortening the model training time. Simultaneously, the modular architecture design adapts to various network environments such as IoT, SDN, and traditional IT, enabling efficient deployment even in resource-constrained devices. It combines computational efficiency with scenario generalization capabilities, solving the problems of existing deep learning-based intrusion detection solutions, such as high computational overhead, long training times, difficulty in efficient deployment in resource-constrained environments, weak generalization capabilities, and inability to adapt to multiple network scenarios.

[0030] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0031] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A network intrusion detection system based on large language models and efficient parameter fine-tuning, characterized in that: It includes a data preprocessing and feature selection module, a text representation module, a large language model analysis engine, and a classification and response module: The data preprocessing and feature selection module is used to collect network data, preprocess the network data to obtain preprocessed data, and use Pearson correlation coefficient to filter features of the preprocessed data, remove redundant features, and obtain a target feature set. The text representation module, connected to the data preprocessing and feature selection module, is used to convert the target feature set into text data in natural language instruction format, wherein the text data includes instruction-response pairs; The large language model analysis engine, connected to the text representation module, includes the Llama3 base model. The Llama3 base model is fine-tuned and trained using a parameter-efficient fine-tuning method combining QLoRA and PEFT to obtain an object detection model. The object detection model is used to analyze the text data and output the classification results of the network data, which include normal network data and abnormal network data. The classification and response module is connected to the large language model analysis engine and is used to perform an alarm operation if the classification result is abnormal network data based on the classification result. If the classification result is normal network data, then the traffic is allowed.

2. The network intrusion detection system based on a large language model and efficient parameter fine-tuning as described in claim 1, characterized in that: The network data includes at least one of network traffic data and system log data; the preprocessing includes at least one of data cleaning and data normalization, wherein data cleaning is to remove missing values, outliers and duplicate values, and data normalization is to map feature values ​​to the [0,1] interval to eliminate the influence of units.

3. The network intrusion detection system based on a large language model and efficient parameter fine-tuning according to claim 1, characterized in that: The text representation module converts the target feature set into text data in natural language instruction format. Specifically, it converts the feature-value pairs in the target feature set into natural language description statements according to a preset instruction template. The instruction is a statement used to inquire whether network data has been intruded and the type of intrusion, and the answer is a statement used to label the network data status and specific attack type.

4. The network intrusion detection system based on a large language model and efficient parameter fine-tuning according to claim 1, characterized in that: The method for fine-tuning the Llama3 base model using a combination of QLoRA and PEFT includes: Construct a training dataset, which includes multiple network intrusion sample data that have been textualized; Based on the QLoRA technique, a low-rank matrix is ​​injected into the attention layer and fully connected layer of the Llama3 base model. Only the parameters of the low-rank matrix are trained and updated without changing the original parameters of the base model. By combining PEFT technology and using an optimizer to iteratively optimize and train the updated low-rank matrix parameters, a target detection model with network intrusion detection capabilities is obtained.

5. The network intrusion detection system based on a large language model and efficient parameter fine-tuning according to claim 1, characterized in that: The training dataset covers sample data of at least one network attack type, including DDoS attacks, U2R attacks, and R2L attacks, and the training dataset is derived from at least one publicly available network intrusion detection dataset, including CIC-DDoS2019, CIC-IDS2017, CICIoT2023, and InSDN.

6. The network intrusion detection system based on a large language model and efficient parameter fine-tuning according to claim 1, characterized in that: When the classification and response module performs an alarm operation, it is specifically used to send alarm information to the network security management platform. The alarm information includes the network attack type, the attack source IP address, and the attack time. When the classification and response module detects abnormal data of DDoS attack type, it can simultaneously perform network traffic blocking operation.

7. A network intrusion detection method based on large language models and efficient parameter fine-tuning, comprising the system described in claims 1-6, characterized in that: The operation steps are as follows: S1: Collect network data, preprocess the network data to obtain preprocessed data, and use Pearson correlation coefficient to filter features of the preprocessed data, remove redundant features, and obtain target feature set; S2: Convert the target feature set into text data in natural language instruction format; S3: Input the text data into the target detection model, analyze the text data through the target detection model, and output the classification result of the network data; S4: Based on the classification results, execute the corresponding alarm or traffic blocking operation.

8. The network intrusion detection method based on a large language model and efficient parameter fine-tuning according to claim 7, characterized in that: In step S1, when using Pearson correlation coefficient for feature selection, the correlation coefficient threshold is set to 0.

9. If the Pearson correlation coefficient between two features is greater than 0.9, one of the redundant features is removed.

9. The network intrusion detection method based on a large language model and efficient parameter fine-tuning according to claim 7, characterized in that: In step S3, the analysis results of the target detection model on the text data, in addition to the basic classification of normal network data and abnormal network data, also include the specific network attack type corresponding to the abnormal network data, specifically covering at least one of DDoS attack, U2R attack, and R2L attack.