Real-time detection method for DNS tunneling attacks using LSTM-Transformer hybrid architecture
By extracting deep temporal patterns and global dependency features from DNS query sequences using an LSTM-Transformer hybrid architecture, the blind spot problem in DNS tunnel attack detection in existing technologies is solved, achieving efficient and real-time DNS tunnel attack detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JIANHENG XINAN (TIANJIN) NETWORK SECURITY TECHNOLOGY CO LTD
- Filing Date
- 2026-01-27
- Publication Date
- 2026-06-30
AI Technical Summary
Existing DNS tunneling attack detection technologies rely on static rules, which make it difficult to effectively identify new types of combined attacks where unrelated protocol fragments are disguised as legitimate DNS data. This results in detection blind spots and consumes a lot of computing resources, making it impossible to complete protocol review in a short time.
The system employs an LSTM-Transformer hybrid architecture, using a temporal feature extraction module and a global dependency modeling module to extract deep temporal patterns and global dependency features from DNS query sequences, generating a joint representation vector, which is then combined with a classification network model for real-time detection.
It achieves millisecond-level real-time detection in high-throughput network environments, significantly improving the detection accuracy and robustness of DNS tunnel attacks. It can identify abnormal protocol combinations that violate the inherent characteristics of nodes and block spoofing detection behavior.
Smart Images

Figure CN121585472B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of Internet technology, and more specifically, to a real-time detection method for DNS tunneling attacks based on an LSTM-Transformer hybrid architecture. Background Technology
[0002] The content in this section provides only background information related to this application and may not constitute prior art.
[0003] As a core infrastructure of the Internet, the Domain Name System (DNS) typically allows its traffic to pass through perimeter security devices such as firewalls, forming a "green channel" for network communication. Attackers maliciously exploit this characteristic to encapsulate non-DNS protocol data (such as stolen information or control commands) within DNS queries (e.g., constructing excessively long subdomains) or responses (e.g., abusing TXT records), establishing covert DNS tunnels. This attack method is highly stealthy and can effectively circumvent traditional port-, protocol-, or fixed-signature-based security protections (such as firewalls, IDS / IPS), posing a serious threat to the confidentiality, integrity, and availability of network data.
[0004] Existing DNS tunneling attack detection technologies primarily involve rigorous format verification of data transmitted through the DNS tunnel (to determine if it belongs to the DNS protocol). This method requires significant computing resources, and since various devices within the Domain Name System do not contain all protocol information, or rather, contain an excessive amount of protocol information, most firewalls cannot complete protocol verification in a short time. To circumvent this protocol verification, some malicious users combine seemingly unrelated DNS protocol data into Trojan commands, thereby bypassing firewalls.
[0005] Thus, existing detection technologies primarily rely on passive review mechanisms of DNS data formats. This mechanism essentially depends on predefined static rules to identify protocol anomalies, making it difficult to effectively counter novel combined attacks where attackers dynamically combine multiple unrelated protocol fragments and disguise them as legitimate DNS data, resulting in significant blind spots in detection. Summary of the Invention
[0006] The summary section of this application is intended to provide a brief overview of the concepts, which will be described in detail in the detailed description section below. This summary section is not intended to identify key or essential features of the claimed technical solutions, nor is it intended to limit the scope of the claimed technical solutions.
[0007] Some embodiments of this application propose a real-time DNS tunneling attack detection method based on an LSTM-Transformer hybrid architecture to address the technical problems mentioned in the background section above.
[0008] As a first aspect of this application, some embodiments of this application provide a real-time detection method for DNS tunneling attacks based on an LSTM-Transformer hybrid architecture, comprising the following steps:
[0009] Step 1: Collect continuous DNS data streams from the network and extract the original DNS query sequences as input features;
[0010] Step 2: Construct a hybrid neural network architecture consisting of a temporal feature extraction module and a global dependency modeling module;
[0011] The temporal feature extraction module is used to learn the temporal contextual dependencies from the DNS query sequence and output a temporal feature vector.
[0012] The global dependency modeling module is used to perform global correlation analysis across sequence positions on the time-series feature vectors to capture long-distance dependency feature vectors;
[0013] Step 3: The information fusion module fuses the temporal feature vector and the long-distance dependency feature vector to generate a joint representation vector;
[0014] Step 4: The classification network model classifies attack behaviors based on the joint representation vector and outputs the detection results of DNS tunnel attacks in real time;
[0015] The classification network model is trained using the labeled joint representation vectors of the corresponding nodes as samples.
[0016] This method collaboratively extracts deep temporal patterns and global dependency features from DNS query sequences using an LSTM-Transformer hybrid architecture, enabling accurate modeling of the inherent DNS protocol behavior patterns of network nodes. Since normal nodes' DNS data exhibits high contextual consistency (stable protocol type, query structure, and domain name distribution), while tunneling attacks require the forced injection of anomalous protocol combinations, the classification network model, through joint representation vectors, can keenly identify such anomalous protocol combinations that violate the inherent characteristics of nodes (such as Trojan commands spliced from irrelevant protocols), effectively blocking attempts to bypass detection using protocol spoofing. Simultaneously, through joint optimization of module computational efficiency and accuracy, the architecture meets the millisecond-level real-time detection requirements in high-throughput network environments while ensuring high detection accuracy, significantly improving proactive defense and security.
[0017] Furthermore, the original DNS query sequence includes the query domain name field and the string sequence of the query type.
[0018] This solution uses the joint extraction of query domain name (QNAME) string sequences and their query type (QTYPE) as input features, enabling the LSTM-Transformer hybrid model to deeply mine and learn the highly consistent association rules between "domain content" and "query intent (type)" inherent in the DNS behavior of network nodes. The model exhibits extremely high sensitivity to anomalous domain name-type combinations that attackers forcibly inject to disrupt this consistency (such as abusing TXT / NULL types for random garbled domain names, or launching unconventional type queries on regular business domain names). It can accurately identify such protocol-content mismatch features that violate node behavior patterns, effectively blocking covert tunnel attacks that use seemingly normal domain names but abuse query types for disguise, significantly improving detection accuracy and robustness while maintaining high throughput and low latency real-time detection performance.
[0019] Furthermore, the temporal feature extraction module includes multiple recurrent neural network units, which control the transmission and forgetting of historical state information through a gating mechanism; the temporal feature vector includes at least the temporal relationship of the query domain name field and the temporal relationship of the query type.
[0020] The temporal feature extraction module uses the gating mechanism of recurrent neural network units (such as LSTM / GRU) to dynamically learn and model the evolution of the query domain name (QNAME) string in the DNS query sequence (such as the order of subdomain appearance and label conversion patterns) and the temporal dependency of query type (QTYPE) (such as type switching frequency and specific type sequence patterns). It effectively captures the inherent and coherent DNS behavior patterns of nodes in the time dimension (such as the phased nature of domain name resolution and the periodicity or correlation of type requests). This allows for the accurate identification of abnormal evolution of domain name structure (such as random subdomain disorder mutations) or disordered / abuse patterns of query type sequences (such as dense and irregular TXT type requests) caused by the forced injection of malicious payloads in attack traffic.
[0021] Furthermore, the global dependency modeling module includes:
[0022] The feature decoupling layer is used to decompose the temporal feature vector into a first sequence related to the domain name and a second sequence related to the query type;
[0023] A sinusoidal position encoder, which obtains a cosine sequence based on a dual-channel input first sequence and a second sequence;
[0024] Multi-head self-attention layer: calculates the correlation between elements at each position in the first and second sequences;
[0025] The output layer outputs long-distance dependency feature vectors based on the relationships between features.
[0026] In this scheme, the self-attention mechanism unbiasedly accesses any position in the sequence, accurately quantifying the implicit correlations between remote queries (such as the synergy of distributed tunneling instructions); the sinusoidal position encoder retains key sequence information, avoiding the loss of temporal features; the combination of the two can identify non-local anomaly patterns that are difficult for LSTM to capture (such as tunnel load fragmentation and reassembly across multi-hop queries), significantly improving the ability to analyze complex tunneling strategies.
[0027] Furthermore, the information fusion module concatenates the temporal feature vector output by the temporal feature extraction module with the long-distance dependency feature vector output by the global dependency modeling module along the feature dimension to generate a joint representation vector containing the two types of feature information.
[0028] In this scheme, by splicing feature dimensions, the local temporal patterns extracted by LSTM and the global interaction patterns mined by Transformer are fully preserved, avoiding information loss caused by early fusion; the generated multi-view joint representation has both time sensitivity and structural completeness, providing the classifier with more comprehensive and complementary discrimination criteria, and significantly reducing the false negative rate.
[0029] Furthermore, the joint representation vector is input into the classification network model, and the probability of attack behavior is calculated through a nonlinear activation function; the probability value is binarized based on a preset decision threshold, and a binary detection identifier for DNS tunnel attacks is generated and output in real time.
[0030] The classification network model accurately calculates the probability value of the current DNS query sequence representing an attack behavior by nonlinearly mapping the joint representation vector containing deep temporal patterns and global dependencies. Based on a preset decision threshold, it performs real-time binary decision-making to generate an intuitive binary detection identifier (0 / 1). This design achieves an efficient and stable conversion from high-dimensional complex features to clear attack judgment. While ensuring millisecond-level response speed, it flexibly balances false positives and false negatives through an adjustable decision threshold. Finally, it outputs real-time detection results that can directly drive the defense system (such as blocking or alarming), completing the end-to-end automated and accurate interception of DNS tunnel attacks.
[0031] Furthermore, the recurrent neural network unit includes:
[0032] Input gate: controls the weight of the current input information in updating the cell state;
[0033] Forget gate: controls the retention weight of historical cell state information;
[0034] Output gate: Controls the contribution weight of the cell state to the output of the current hidden layer;
[0035] Cellular state: a memory carrier that transmits long-range dependent information across time steps.
[0036] The Recurrent Neural Network (LSTM) unit precisely regulates information flow through a triple dynamic gating mechanism consisting of an input gate, a forget gate, and an output gate: the forget gate actively weakens irrelevant historical information (such as the influence of regular domain name caching), the input gate strengthens the abnormal features of the current input (such as sudden random subdomains or abnormal query types), and the output gate extracts and transmits state information crucial for detection to the next time step. Simultaneously, the cell state, as a low-loss, long-distance information channel, effectively maintains coherent memory of key features of long-period, low-rate DNS tunneling attacks (such as slow data leakage). This mechanism significantly improves the model's ability to capture long-range correlations of attack signals and its robustness against background traffic interference in complex, high-noise network environments, laying a solid foundation for accurately identifying highly concealed and dispersed tunneling attacks.
[0037] Furthermore, the method for generating the first and second sequences using the feature decoupling layer is as follows:
[0038] Step 1: Define the domain name projection matrix and the query type domain name projection matrix;
[0039] Step 2: Hidden state h of the temporal feature vector at each time step t t Perform dual-channel projection to generate decoupled domain name feature sequences and query type feature sequences.
[0040] Furthermore, classification network models include:
[0041] Feature abstraction layer: contains at least one fully connected layer, used to extract high-order discriminative features;
[0042] Probability output layer: A single-neuron layer containing a Sigmoid activation function, which generates probability values for attack behavior.
[0043] In this scheme, the joint representation is mapped to a higher-dimensional separable space through multi-layer nonlinear transformation, and the decision boundary between tunnels and legitimate traffic is learned (such as distinguishing between pseudo-random domains and real CDN domains). At the same time, the abstract features are compressed to the [0,1] interval to provide quantifiable threat confidence, support flexible threshold adjustment to balance false positives / false negatives, and adapt to different security strategy scenarios. Attached Figure Description
[0044] Figure 1 This is a flowchart of a real-time DNS tunneling attack detection method based on an LSTM-Transformer hybrid architecture. Detailed Implementation
[0045] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below in conjunction with specific embodiments. The same reference numerals in the accompanying drawings represent the same components. It should be noted that the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the described embodiments of this application without creative effort are within the scope of protection of this application.
[0046] Compared to the embodiments shown in the accompanying drawings, feasible embodiments within the scope of this application may have fewer components, other components not shown in the drawings, different components, differently arranged components, or components with different connections, etc. Furthermore, two or more components in the drawings may be implemented in a single component, or a single component shown in the drawings may be implemented as multiple separate components.
[0047] Unless otherwise defined, the technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this application pertains. The terms “first,” “second,” and similar terms used in this specification and claims do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Similarly, the terms “an” or “a” and similar terms do not necessarily indicate a quantity limitation. Terms such as “upper” and “lower” are used only to indicate relative positional relationships, and these relative positional relationships may change accordingly when the absolute position of the described object changes.
[0048] This solution can be deployed on critical network traffic paths at boundary protection nodes (such as enterprise firewalls and IDPS systems), core infrastructure nodes (such as recursive DNS servers and cloud security gateways), and endpoint security agents. By analyzing the DNS query sequences flowing through the nodes in real time, it can accurately identify and block DNS tunneling attacks launched by malware, APT attacks, or internal threats. For each boundary protection node, a classification network model needs to be trained specifically to adapt to different network environments.
[0049] refer to Figure 1 Example 1: A real-time detection method for DNS tunnel attacks based on an LSTM-Transformer hybrid architecture, comprising the following steps:
[0050] Step 1: Collect continuous DNS data streams from the network and extract the raw DNS query sequence as input features. The raw DNS query sequence includes the query domain name field and the string sequence of query type.
[0051] The continuous DNS data stream collected in this step refers to the DNS protocol communication traffic (usually based on UDP port 53 or TCP port 53) captured in real time and transmitted continuously by network devices (such as gateways and DNS servers), which includes DNS query requests issued by all nodes in the network and their corresponding response data packets.
[0052] The extracted raw DNS query sequence specifically refers to the core fields of unprocessed DNS query request packets arranged in chronological order. Each query entry contains a specific query domain name field (QNAME, i.e., the complete domain name string requested for resolution) and a query type field (QTYPE, representing the requested record type, such as a string identifier like A, AAAA, MX, TXT, etc.). These fields are sorted by the packet arrival timestamp to form a sequence, which serves as the input features for subsequent detection models.
[0053] In this scheme, the original DNS query sequence is transformed into input features using a pre-set encoder.
[0054] The input features are actually a sequence of strings consisting of query domain name fields and query types. The query domain name fields and query types are converted into corresponding codes by the encoder.
[0055] For example, if a query domain name field is A, the corresponding code is 001.
[0056] Step 2: Construct a hybrid neural network architecture consisting of a temporal feature extraction module and a global dependency modeling module;
[0057] The temporal feature extraction module is used to learn the temporal contextual dependencies from the DNS query sequence and output a temporal feature vector.
[0058] The global dependency modeling module is used to perform global correlation analysis across sequence positions on the time-series feature vectors to capture long-distance dependency feature vectors.
[0059] Hybrid neural network architectures are primarily used to collaboratively fuse temporal local features and global dependencies. They capture the dynamic evolution of query behavior over time and identify potential, complex, long-distance relationships between elements at any position in a sequence.
[0060] The temporal feature extraction module includes multiple recurrent neural network units, which control the transmission and forgetting of historical state information through a gating mechanism; the temporal feature vector includes at least the temporal relationship of the query domain name field and the temporal relationship of the query type.
[0061] Recurrent neural network units include:
[0062] Input gate: controls the weight of the current input information in updating the cell state;
[0063] Forget gate: controls the retention weight of historical cell state information;
[0064] Output gate: Controls the contribution weight of the cell state to the output of the current hidden layer;
[0065] Cellular state: a memory carrier that transmits long-range dependent information across time steps.
[0066] Among them, the input features are ;
[0067] , This represents the total dimension of the input features;
[0068] In the input gate: ;
[0069] In the Gate of Oblivion:
[0070] In the output gate: ;
[0071] In the cellular state: ;
[0072] ;
[0073] .
[0074] The time series feature vector is ;
[0075] Where t represents the index of the time step (label index) in the input feature, and x t Let h represent the input vector at time t. t-1 This represents the hidden state at time t-1. h t-1 and x t The concatenated vector, , , , These represent the forget gate bias vector, input gate bias vector, output gate bias vector, and candidate state bias vector, respectively. This represents the Sigmoid activation function. , , , , Let represent the forget gate output vector, input gate output vector, output gate vector, candidate cell state, and cell state at time t, respectively. , , , These are the forget gate weight matrix, input gate weight matrix, output gate weight matrix, and candidate state weight matrix, respectively. This represents the hyperbolic tangent activation function. This indicates element-wise multiplication.
[0076] The global dependency modeling module calculates the temporal relationship of the query domain name field in the temporal feature vector in parallel, as well as the association weight between any elements in the temporal relationship of the query type, and integrates the positional information of the temporal feature vector to output an enhanced long-distance dependency feature vector.
[0077] The global dependency modeling module includes:
[0078] The feature decoupling layer is used to deconstruct the temporal feature vector into a first sequence related to the domain name and a second sequence related to the query type.
[0079] The reason this application requires deconstructing the temporal feature vector is primarily to better extract the correlation between domain names and query types. Information flows differ across network nodes, and extracting temporal features solely from domain name or query type sequences fails to uncover clues about the correlation between them. The resulting correlation weights are too fragmented, easily burying attack information within the complex DNS sequence.
[0080] In this scheme, time-series features are first obtained, and then structured. During the restructuring process, the first and second sequences obtained are no longer single sequence information, thereby generating association weights with higher accuracy.
[0081] In other words, the information projected onto the domain name projection matrix retains the domain name structure characteristics, while the information projected onto the query type domain name projection matrix retains the query type characteristics.
[0082] ;
[0083] ;
[0084] in, Represents the domain name feature sequence. This represents the query type feature sequence, where T represents the total number of elements in the domain name feature sequence. This represents the projection result of the temporal feature vector onto the domain name projection matrix. This represents the projection result of the time-series feature vector onto the query type domain name projection matrix;
[0085] A sinusoidal position encoder, which obtains a cosine sequence based on a dual-channel input first sequence and a second sequence;
[0086] A sinusoidal position encoder is a cosine function with a fixed frequency. By alternately inputting the elements of the first sequence and the second sequence into the cosine function, a cosine sequence can be obtained.
[0087] Although the cosine sequence is a single sequence, it is composed of alternating elements from the first and second sequences. Therefore, it actually merges the two sequences together without removing the order relationship between the first and second sequences. The subsequent multi-head self-attention layer can naturally extract the correlation between the elements at each position in the first and second sequences.
[0088] Multi-head self-attention layer: calculates the correlation between elements at each position in the first and second sequences.
[0089] The multi-head self-attention layer is a core component of the Transformer architecture, specifically designed to model global dependencies between any elements in a sequence. The specific model structure is based on existing technology and will not be elaborated further here.
[0090] The output layer outputs long-distance dependency feature vectors based on the relationships between features.
[0091] Step 3: The information fusion module fuses the temporal feature vector and the long-distance dependency feature vector to generate a joint representation vector.
[0092] The information fusion module concatenates the temporal feature vector output by the temporal feature extraction module with the long-distance dependency feature vector output by the global dependency modeling module along the feature dimension to generate a joint representation vector containing the two types of feature information.
[0093] Feature concatenation directly connects two feature vectors along their feature dimensions to generate a higher-dimensional joint vector, thus preserving the complete information of the original features and expanding the representational power.
[0094] For example, if the time series feature vector Long-distance dependent feature vectors The splicing result is .
[0095] Step 4: The classification network model classifies attack behaviors based on the joint representation vector and outputs the detection results of DNS tunnel attacks in real time;
[0096] The classification network model is trained using the labeled joint representation vectors of the corresponding nodes as samples.
[0097] The joint representation vector is input into the classification network model, and the probability of attack behavior is calculated through a nonlinear activation function. The probability value is binarized based on a preset decision threshold, and a binary detection identifier for DNS tunneling attack is generated and output in real time.
[0098] Classification network models include:
[0099] Feature extraction layer: contains at least one fully connected layer for extracting high-order discriminative features;
[0100] Probability output layer: A single-neuron layer containing a Sigmoid activation function, which generates probability values for attack behavior.
[0101] The feature extraction layer is essentially a 5-layer fully connected network. It extracts latent information from the joint representation vector. The probability output layer calculates the probability of attack behavior based on the activation function, binarizes the probability value based on a preset decision threshold, and generates and outputs a binary detection identifier for DNS tunneling attacks in real time. In other words, it generates a binary detection identifier indicating whether an attack has occurred, such as 01001, where 0 indicates no attack and 1 indicates an attack.
[0102] Training Process: This scheme involves two independent networks: a hybrid neural network architecture and a classification network model. Both require separate training. Specifically, the classification network model needs to be trained separately for each boundary node. The hybrid neural network architecture, being a feature extraction model, only needs to be trained once.
[0103] The training samples for the classification network model are a joint representation vector and an attack label. The joint representation vector serves as the input, and the attack label serves as the label. The classification network model calculates the predicted output from the input data (such as a DNS query sequence) through forward propagation, and uses a loss function (cross-entropy loss function) to quantify the error between the predicted result and the true label. Based on this error, backpropagation is performed, and a stochastic gradient descent optimizer (Adam) is used to calculate the gradient of the parameters of each layer of the network and update the weights.
[0104] During training, the weights of each layer in a hybrid neural network architecture are updated in the same way as in a classification network model. The training of a hybrid neural network model is unsupervised.
[0105] The main difference between training hybrid neural network models and classification network models lies in the different samples and loss functions.
[0106] When training a hybrid neural network model, multiple positive and negative samples are generated. Positive samples refer to data where the temporal feature vector and long-range dependency feature vector are derived from real sampled data. Negative samples refer to data where the temporal feature vector and long-range dependency feature vector are randomly paired. By using positive and negative samples to replace the original sample labels, unsupervised training is achieved.
[0107] The loss function is :
[0108] ;
[0109] Where T represents the temporal feature vector, and G represents the long-distance dependency feature vector. This represents the long-distance dependency feature vector of negative samples. Let F represent the similarity calculation function (cosine similarity), and let F represent the feature projection space. Represents the loss balance parameter. Indicates positive sample alignment item. Represents the negative sample separation term. This represents higher-order feature constraints.
[0110] The loss function in this scheme guides the establishment of a deep intrinsic correlation between temporal feature vectors and long-distance dependent feature vectors through a triple collaborative mechanism. The positive sample alignment term forces homogeneous feature pairs to be similar in the projection space (maximizing similarity), making temporal context patterns and global dependency patterns mutually attractive in the latent space; the negative sample separation term excludes non-matching feature pairs, generating negative samples through batch rearrangement to construct the decision boundary and enhance feature discriminability; the higher-order feature constraint term directly minimizes feature distance in the projection space, maintaining geometric consistency even when similarity judgment fails. This three-part mechanism of "explicit association modeling + implicit spatial constraints + adversarial negative sample screening" deeply couples local temporal patterns (such as DNS query frequency fluctuations) and global dependency patterns (such as cross-sequence domain name associations) through statistical co-occurrence and geometric alignment. Furthermore, this similarity calculation and discrimination method can find the actual correspondence between temporal feature vectors and long-distance dependent feature vectors in the real world, increasing the accuracy of temporal feature vector and long-distance dependent feature vector extraction, while also increasing the model's generalization ability and exhibiting good model performance in different environments.
[0111] The above are merely preferred embodiments of this application and are not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
1. A real-time detection method for DNS tunneling attacks based on an LSTM-Transformer hybrid architecture, characterized in that, Includes the following steps: Step 1: Collect continuous DNS data streams from the network and extract the original DNS query sequences as input features; Step 2: Construct a hybrid neural network architecture consisting of a temporal feature extraction module and a global dependency modeling module; The temporal feature extraction module is used to learn the temporal contextual dependencies from the DNS query sequence and output a temporal feature vector. The global dependency modeling module is used to perform global correlation analysis across sequence positions on the time-series feature vectors to capture long-distance dependency feature vectors; Step 3: The information fusion module fuses the temporal feature vector and the long-distance dependency feature vector to generate a joint representation vector; Step 4: The classification network model classifies attack behaviors based on the joint representation vector and outputs the detection results of DNS tunnel attacks in real time; The classification network model is trained using the labeled joint representation vectors of the corresponding nodes as samples. The raw DNS query sequence includes the query domain name field and the query type string sequence; the raw DNS query sequence specifically refers to the core fields of the unprocessed DNS query request packets arranged in chronological order, with each query entry containing the specific query domain name field; The temporal feature vector should include at least the temporal relationship of the query domain name field and the temporal relationship of the query type; The global dependency modeling module includes: The feature decoupling layer is used to decompose the temporal feature vector into a first sequence related to the domain name and a second sequence related to the query type; A sinusoidal position encoder, which obtains a cosine sequence based on a dual-channel input first sequence and a second sequence; A sinusoidal position encoder is a cosine function with a fixed frequency. By alternately inputting the elements of the first sequence and the second sequence into the cosine function, a cosine sequence can be obtained. Multi-head self-attention layer: calculates the correlation between elements at each position in the first and second sequences; The output layer outputs long-range dependency feature vectors based on the relationships between features. The method for generating the first and second sequences using the feature decoupling layer is as follows: Define the domain name projection matrix and the query type domain name projection matrix; The hidden state h of the temporal feature vector at each time step t t Perform dual-channel projection to generate decoupled domain name feature sequences and query type feature sequences; ; ; in, Represents the domain name feature sequence. This represents the query type feature sequence, where T represents the total number of elements in the domain name feature sequence. This represents the projection result of the temporal feature vector onto the domain name projection matrix. This represents the projection result of the time-series feature vector onto the query type domain name projection matrix; The information fusion module concatenates the temporal feature vector output by the temporal feature extraction module with the long-distance dependency feature vector output by the global dependency modeling module along the feature dimension to generate a joint representation vector containing the two types of feature information. The joint representation vector is input into the classification network model, and the probability of attack behavior is calculated through a nonlinear activation function. The probability value is binarized based on a preset decision threshold, and a binary detection identifier for DNS tunneling attack is generated and output in real time.
2. The real-time detection method for DNS tunnel attacks based on an LSTM-Transformer hybrid architecture according to claim 1, characterized in that, The temporal feature extraction module includes multiple recurrent neural network units, which control the transmission and forgetting of historical state information through a gating mechanism.
3. The real-time detection method for DNS tunnel attacks based on an LSTM-Transformer hybrid architecture according to claim 1, characterized in that, Recurrent neural network units include: Input gate: controls the weight of the current input information in updating the cell state; Forget gate: controls the retention weight of historical cell state information; Output gate: Controls the contribution weight of the cell state to the output of the current hidden layer; Cellular state: a memory carrier that transmits long-range dependent information across time steps.
4. The real-time detection method for DNS tunnel attacks based on an LSTM-Transformer hybrid architecture according to claim 1, characterized in that, Classification network models include: Feature abstraction layer: contains at least one fully connected layer, used to extract high-order discriminative features; Probability output layer: A single-neuron layer containing a Sigmoid activation function, which generates probability values for attack behavior.