An email-oriented APT attack detection method and system
By using multimodal entropy calculation and semantic reasoning of the LLM model, the problem of low detection accuracy caused by isolated analysis of multimodal features in existing technologies is solved, and efficient identification and interpretable detection results for email APT attacks are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU YINGSHI SECURITY TECHNOLOGY CO LTD
- Filing Date
- 2026-04-14
- Publication Date
- 2026-06-23
AI Technical Summary
Existing APT attack detection technologies targeting email suffer from low detection accuracy because isolated analysis of multimodal features cannot effectively identify cross-modal logical conflicts.
By acquiring emails with attachments to be detected, multimodal entropy calculation is performed to construct a multidimensional entropy vector. This vector is then combined with email garbled character recognition and a pre-trained LLM model for semantic reasoning to generate APT attack intent detection results.
It achieves effective identification of cross-modal logical conflicts, improves the accuracy of APT attack detection, and outputs interpretable judgment results through the semantic understanding capability of LLM.
Smart Images

Figure CN122268652A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of APT attack detection technology, and in particular to an APT attack detection method and system for email. Background Technology
[0002] Email, as a core medium for daily communication and business dealings in modern government and enterprise organizations, is also a favorite initial intrusion route for Advanced Persistent Threat (APT) attackers. Attackers often embed malicious payloads into email attachments or the email body, using social engineering techniques to induce recipients to interact, thereby breaching network boundaries and achieving long-term infiltration and data theft. Therefore, APT attack detection targeting email has become a crucial component of network security defense systems.
[0003] Existing APT attack detection technologies targeting email mainly include: static detection techniques based on signatures, which involve extracting hash values or byte sequences from known malicious samples and performing bit-by-bit matching; dynamic behavior analysis techniques based on sandboxes, which induce samples to run in an isolated environment and monitor their API call behavior; anomaly detection techniques based on statistical features, which analyze the randomness of domain name strings using Markov chains to identify DGA domains; and detection techniques based on deep learning, which use attention mechanisms to perform sequence modeling of payloads or scripts. However, all of the above technologies have common drawbacks in practical applications: the detection modules operate in isolation, like silos. For example, DGA detection only focuses on the randomness of the URL string and ignores the source reputation of the email header, while sandboxes only focus on the runtime behavior of attachments and ignore semantic anomalies in static text. Due to the lack of cross-modal correlation analysis capabilities for logical conflicts between features of different dimensions, attackers can construct emails with normal header features but obfuscated attachment payloads or embedded high-entropy domains, causing each independent detection module to fail to trigger its respective threshold, resulting in a false negative by the overall detection system. Therefore, how to overcome the limitations of isolated analysis of multimodal features and improve the detection accuracy of APT attack emails is a technical problem that urgently needs to be solved. Summary of the Invention
[0004] This invention provides an APT attack detection method and system for email, which can solve the technical problem of low APT attack detection accuracy caused by the isolated analysis of multimodal features in existing APT attack email detection technologies, which cannot effectively identify cross-modal logical conflicts.
[0005] In a first aspect, embodiments of the present invention provide a method for detecting APT attacks targeting email, comprising: The system acquires emails with attachments to be inspected, and performs multimodal entropy calculations on each email to obtain a multidimensional entropy vector. The multimodal entropy includes header construction entropy, behavioral inertia entropy, attachment structure entropy, and URL feature entropy. The email to be tested is subjected to garbled character recognition to determine whether there is garbled character in the email; wherein, the garbled character recognition includes character set analysis, encoding format detection and natural language feature judgment; If it is determined that there are no garbled characters in the email to be detected, then a prompt word is constructed using the multidimensional entropy vector, and the prompt word is input into a pre-trained LLM model so that the LLM model performs semantic reasoning on the prompt word, generates and outputs the APT attack intent detection result of the email to be detected.
[0006] This invention quantifies email features from different dimensions into comparable numerical values (entropy values), providing a unified metric for subsequent cross-modal logical conflict analysis. It identifies malicious payloads with obfuscated encodings within emails, providing a basis for determining subsequent processing path branches. By leveraging the semantic understanding capabilities of LLM (such as whether low behavioral inertia entropy + high URL feature entropy constitute an anomalous combination), it outputs interpretable judgment results. Compared to existing technologies, this invention solves the technical problem of low APT attack detection accuracy caused by isolated analysis of multimodal features, which fails to effectively identify cross-modal logical conflicts.
[0007] In some preferred embodiments of the first aspect, multimodal entropy is calculated based on the email to be detected to obtain a multidimensional entropy vector, specifically as follows: The email to be inspected is parsed according to the protocol to extract structured metadata; wherein, the structured metadata includes header fields, attachment metadata, Uniform Resource Locator, sender identifier and current sending timestamp; Based on the structured metadata, each multimodal entropy is calculated separately, and the multimodal entropies are integrated to obtain a multidimensional entropy vector.
[0008] This invention converts unstructured raw emails into structured data, providing standardized input for subsequent entropy value calculations; and transforms scattered metadata into a unified multidimensional numerical vector, facilitating the subsequent LLM processing of all modal features at once.
[0009] In some preferred embodiments of the first aspect, the header field includes a sending client field, a mail subject field, and a mail ID field; Specifically, based on the structured metadata, the multimodal entropy is calculated for each mode, including: The sending client field, email subject field, and email ID field are concatenated into a combined string according to a preset order, and the Shannon entropy of the combined string is calculated. Divide the Shannon entropy by a preset header entropy normalization factor to obtain the first intermediate entropy value, and take the minimum value between the first intermediate entropy value and 1 as the header construction entropy.
[0010] This invention selects three typical header fields that are sensitive to forgery. These fields have a fixed pattern in normal emails but often exhibit randomness in forged emails. By merging multiple fields into a single string, the Shannon entropy calculation can reflect the joint randomness between the fields, rather than isolated analysis. The randomness of the header string is quantified by calculating Shannon entropy; normal emails have lower entropy values (fixed vocabulary), while forged emails have higher entropy values (random padding). A normalization factor is used to eliminate the influence of string length on the entropy value, normalizing the entropy value to the [0,1] interval, which facilitates comparison and integration with other entropy values.
[0011] In some preferred embodiments of the first aspect, each multimodal entropy is calculated based on the structured metadata, including: Based on the sender identifier, query the historical sending timestamp sequence of the corresponding sender within the preset historical time window, and based on the historical sending timestamp sequence, count the total number of sending times within the historical time window; Based on the current sending timestamp and the historical sending timestamp sequence, calculate the time interval sequence of adjacent sending behaviors, and based on the total number of sendings and the time interval sequence, calculate the standard deviation or variance of the time interval sequence to obtain the discrete value of sending time. The average sending interval is calculated based on the length of the preset historical time window and the total number of sending messages. The product of the discrete value of sending time, the average sending interval, and the preset smoothing adjustment parameter is calculated to obtain the second intermediate entropy value. The second intermediate entropy value is then normalized using the hyperbolic tangent activation function to obtain the behavioral inertia entropy.
[0012] This invention provides a reference for judging whether current behavior is regular by acquiring the sender's historical behavior data; by calculating the discrete value of the sending time, the regularity of the sending time is quantified: the smaller the discrete value, the more stable the sending time interval is, and the closer it is to the machine's behavior characteristics; by calculating the average sending interval, the sending frequency is reflected, and combined with the discrete value, "low-frequency regularity" and "high-frequency regularity" can be distinguished; by mapping "regularity" to an entropy value close to 0 (low entropy means high risk), the traditional logic of "high frequency is abnormal" is overturned, so that extremely regular low-frequency behavior is identified as abnormal.
[0013] In some preferred embodiments of the first aspect, the attachment metadata includes the attachment text content, the number of nested levels of the compressed package, macro existence identifiers, and obfuscation scores; Specifically, based on the structured metadata, the multimodal entropy is calculated for each mode, including: The attachment text content in the attachment metadata is subjected to garbled character recognition to obtain a garbled character flag; the garbled character recognition includes character set analysis, encoding format detection, and natural language feature judgment. If the garbled character flag is true, then the appendix structure entropy is assigned a preset high-risk threshold. If the garbled character flag is false, then the nesting level of the compressed package, the macro existence flag, and the obfuscation score are weighted and summed to obtain a weighted sum value, and the minimum value between the weighted sum value and 1 is taken as the attachment structure entropy.
[0014] This invention improves the accuracy of quantification by selecting multiple risk-related dimensions in attachments, covering content (text), structure (nesting), function (macro), and naming (obfuscation). It determines whether attachments contain malicious payloads with obfuscated encoding, a common tactic in APT attacks, and directly marks high-entropy garbled text as high-risk, identifying obfuscated payloads without decoding, thus solving the problem that traditional sandboxes cannot handle non-execution-state payloads. For non-garbled attachments, their structural risks are quantified through weighted combination (multi-layer compression, macros, and filename obfuscation are all high-risk features).
[0015] In some preferred embodiments of the first aspect, the Uniform Resource Locator includes a hostname field and a path field; Specifically, based on the structured metadata, the multimodal entropy is calculated for each mode, including: The hostname field and the path field are concatenated into a URL string, and the Shannon entropy of the URL string is calculated. Divide the Shannon entropy by a preset URL entropy normalization factor to obtain a third intermediate entropy value, and take the minimum value between the third intermediate entropy value and 1 as the URL feature entropy.
[0016] This invention combines the core parts of the URL to eliminate interference from fixed prefixes such as protocols in entropy calculation; it quantifies the randomness of domain name strings by calculating Shannon entropy: DGA domains have high entropy values, while normal domains have low entropy values; normalization facilitates comparison with other entropy values; the acquisition of URL feature entropy provides a basis for judging whether a link is suspicious in cross-modal conflict analysis.
[0017] In some preferred embodiments of the first aspect, the LLM model performs semantic reasoning on the prompt words to generate and output the APT attack intent detection result of the email to be detected, specifically as follows: The prompt word is segmented into word units to obtain a word unit sequence, and each word unit in the word unit sequence is mapped to a word unit feature vector to obtain a word unit vector matrix; wherein, the word unit sequence includes each word unit corresponding to each multimodal entropy in the multidimensional entropy value vector; By using a pre-trained attention mechanism, the correlation degree between each word feature vector in the word vector matrix is calculated to obtain a correlation degree matrix. Based on the correlation degree matrix, the word feature vectors in the word vector matrix are weighted and fused to obtain a fused feature vector. The fused feature vector is compared with a pre-trained cybersecurity knowledge graph to generate and output the APT attack intent detection result of the email to be detected.
[0018] This invention, through obtaining a lexical vector matrix, transforms natural language prompts into a computer-computable numerical matrix, preparing for subsequent attention calculations; by calculating the correlation matrix, it quantifies the semantic correlation strength between different lexical units (especially those representing different entropy values), capturing the nearest neighbor relationships of lexical units such as "low inertial behavior entropy" and "high attachment structure entropy" in the vector space; by aggregating the information of the entire sequence into a fusion vector, this vector simultaneously contains representations of the entropy values of each modality and their interrelationships; and through similarity comparison, it matches the feature patterns of the current email with known attack patterns (such as kill chain features) to calculate the probability of it being malicious or benign.
[0019] Among the preferred options in the first aspect are: If it is determined that there are garbled characters in the email to be detected, the start and end positions of the garbled characters in the corresponding text are identified, and characters of a preset length are truncated forward and backward according to the start and end positions to obtain the context text block; The prompt words are constructed using the multidimensional entropy vector and the context text block, and then input into a pre-trained LLM model so that the LLM model can perform semantic reasoning on the prompt words to generate and output the APT attack intent detection result of the email to be detected.
[0020] This invention provides semantic context for LLM by extracting natural language content (such as leading phrases) surrounding garbled text; by allowing LLM to simultaneously analyze entropy features and the garbled text context, it infers the true purpose of the garbled text (such as whether it is masked by leading phrases). This invention enables "demasking" inference of encoded obfuscated payloads—malicious intent can be determined solely through the semantic context surrounding the garbled text (such as social engineering phrases) without decoding, thus solving the problem of traditional techniques failing to detect obfuscated payloads.
[0021] Among the preferred options in the first aspect are: If the APT attack intent detection result shows that there is an APT attack intent in the email to be detected, then determine whether each multimodal entropy in the multidimensional entropy vector satisfies the preset entropy combination condition. If it is determined that each multimodal entropy in the multidimensional entropy vector satisfies the preset entropy combination condition, then the email interception mechanism is triggered; otherwise, a corresponding natural language parsing is generated based on the multidimensional entropy vector, and an alarm message containing the natural language parsing is output.
[0022] This invention improves decision reliability by adding entropy rule verification to the LLM semantic judgment, forming a dual verification and realizing the fusion decision of "AI semantic dominance + entropy rule verification". It utilizes the deep reasoning capability of LLM, prevents false alarms through explicit rules, and outputs interpretable alarm information.
[0023] Secondly, embodiments of the present invention provide an APT attack detection system for email, including a multi-dimensional entropy calculation module, an email garbled character recognition module, and an APT attack detection module, wherein... The multidimensional entropy calculation module is used to acquire emails to be detected with attachments, and to perform multimodal entropy calculations on the emails to be detected to obtain multidimensional entropy vectors; wherein, the multimodal entropy includes header construction entropy, behavioral inertia entropy, attachment structure entropy and URL feature entropy; The email garbled character recognition module is used to perform garbled character recognition on the email to be detected in order to determine whether there is garbled character in the email to be detected; wherein, the garbled character recognition includes character set analysis, encoding format detection and natural language feature judgment; The APT attack detection module is used to construct a prompt word through the multidimensional entropy vector if it is determined that there is no garbled text in the email to be detected, and input the prompt word into a pre-trained LLM model so that the LLM model can perform semantic reasoning on the prompt word, generate and output the APT attack intent detection result of the email to be detected.
[0024] This invention utilizes a multi-dimensional entropy calculation module to quantify email features of different dimensions into comparable numerical values (entropy values), providing a unified metric basis for subsequent cross-modal logical conflict analysis; an email garbled character recognition module identifies whether there are malicious payloads with encoded obfuscation in emails, providing a basis for determining subsequent processing path branches; and an APT attack detection module leverages the semantic understanding capabilities of LLM to analyze the logical relationships between various entropy values (such as whether low behavioral inertia entropy + high URL feature entropy constitute an abnormal combination), outputting interpretable judgment results.
[0025] The above description is merely an overview of the technical solutions of the embodiments of the present invention. In order to better understand the technical means of the embodiments of the present invention and to implement them in accordance with the contents of the specification, and to make the above and other objects, features and advantages of the embodiments of the present invention more apparent and understandable, specific embodiments of the present invention are described below. Attached Figure Description
[0026] Figure 1 A schematic diagram of an APT attack detection method for email provided in an embodiment of the present invention; Figure 2 This is a schematic diagram illustrating an APT attack detection process for email, as exemplified by an embodiment of the present invention. Figure 3 This is a structural diagram of an APT attack detection system for email provided in an embodiment of the present invention. Detailed Implementation
[0027] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0028] Example 1: like Figure 1 As shown, an embodiment of the present invention provides a method for detecting APT attacks targeting email, comprising: S101, Obtain the email to be detected with attachments, and perform multimodal entropy calculations on the email to be detected to obtain a multidimensional entropy vector; wherein, the multimodal entropy includes header construction entropy, behavioral inertia entropy, attachment structure entropy and URL feature entropy; In this embodiment, multimodal entropy calculations are performed on the email to be detected to obtain a multidimensional entropy vector. Specifically, the email to be detected is parsed according to a protocol to extract structured metadata. The structured metadata includes header fields, attachment metadata, Uniform Resource Locator (URL), sender identifier, and current sending timestamp. Based on the structured metadata, each multimodal entropy is calculated and integrated to obtain a multidimensional entropy vector.
[0029] In one specific embodiment, the email to be detected undergoes protocol parsing to extract structured metadata, including: parsing email data or traffic logs to extract structured information from the header, body, and attachments; for attachments, extracting metadata such as filename, size, type, and nesting level, rather than performing a full scan of the file content; using text extraction techniques such as regular expressions to extract key fields such as URL, sending tool identifier, message identifier, and transmission path timestamp from the body, HTML content, and header; and identifying potential encoding obfuscation payloads through character set analysis, encoding format detection, and natural language feature judgment, and generating corresponding tagging information to provide a basis for subsequent semantic analysis.
[0030] In this embodiment, the header fields include a sending client field, an email subject field, and an email ID field. The calculation of each multimodal entropy based on the structured metadata includes: concatenating the sending client field, email subject field, and email ID field into a combined string according to a preset order, and calculating the Shannon entropy of the combined string; dividing the Shannon entropy by a preset header entropy normalization factor to obtain a first intermediate entropy value, and taking the minimum value between the first intermediate entropy value and 1 as the header construction entropy.
[0031] In one specific embodiment, the header field is extracted through MIME protocol parsing.
[0032] In one specific embodiment, the sending client field, email subject field, and email ID field are concatenated into a combined string according to a preset order. Specifically, the email sending client (X-Mailer), email subject, and email globally unique identifier (Message-ID) are concatenated in the aforementioned fixed order and then separated by a preset delimiter to form a combined string.
[0033] It should be noted that the headers generated by normal email clients (such as Outlook and Thunderbird) follow strict RFC standards, have a limited vocabulary, and stable and low entropy values. The forgery tools used by APT attackers (such as Cobalt Strike's Phishing Module or the Python smtplib script) often generate headers containing random hash values and randomly padded characters to evade static signature detection.
[0034] It should be noted that for any given discrete random variable X (in this case, the character distribution in the string), its Shannon entropy H(X) is defined as: in, It represents the probability of the i-th character appearing in the string.
[0035] It should be noted that a normalization factor is introduced to eliminate the influence of different field lengths on the entropy value. Normalized entropy value The calculation formula is: In one specific embodiment, the normalized entropy of the combined string is calculated using the two calculation formulas described above, and this entropy is used as the header construction entropy. If the header construction entropy... If the value is significantly higher than the baseline (e.g., >0.75), it suggests that the header may have been generated by an algorithm.
[0036] In this embodiment, the multimodal entropy is calculated based on the structured metadata, including: querying the historical sending timestamp sequence of the corresponding sender within a preset historical time window based on the sender identifier, and counting the total number of sending times within the historical time window based on the historical sending timestamp sequence; calculating the time interval sequence of adjacent sending behaviors based on the current sending timestamp and the historical sending timestamp sequence, and calculating the standard deviation or variance of the time interval sequence based on the total number of sending times and the time interval sequence to obtain the sending time discrete value; calculating the average sending interval based on the length of the preset historical time window and the total number of sending times; calculating the product of the sending time discrete value, the average sending interval, and the preset smoothing adjustment parameter to obtain the second intermediate entropy value, and normalizing the second intermediate entropy value through the hyperbolic tangent activation function to obtain the behavioral inertia entropy.
[0037] Preferably, the historical sending timestamps can be obtained through log metadata extraction and time-series aggregation.
[0038] In one specific embodiment, the calculation process of behavioral inertia entropy involves: obtaining the historical sending timestamp sequence of the sender (Sender IP or From Address). ; Calculate the time interval sequence of adjacent sending actions ,in ; Count the total number of messages N sent within a unit time window W, and calculate the variance S² (or standard deviation) of the time interval sequence. ( ), to measure the degree of dispersion of behavior over time.
[0039] Furthermore, the coefficient of variation is introduced, and the behavioral inertia entropy is calculated using the following comprehensive formula. : in, The average sending interval within a unit time window. is the preset smoothing adjustment parameter, and tanh is the hyperbolic tangent activation function used to normalize the result to the (0, 1) interval.
[0040] It should be noted that the above comprehensive formula reflects the "degree of non-human intervention": when the sending behavior exhibits extremely regular automated characteristics (such as timed heartbeat packets), the standard deviation of the time interval σ(∆T) approaches 0, causing the ES value to approach 0; the closer the value is to 0, the more mechanical the behavior and the higher the risk.
[0041] In this embodiment, the attachment metadata includes attachment text content, the number of nested levels of the compressed package, macro presence identifiers, and obfuscation scores. Specifically, based on the structured metadata, each multimodal entropy is calculated, including: identifying garbled characters in the attachment text content of the attachment metadata to obtain a garbled character flag; the garbled character identification includes character set analysis, encoding format detection, and natural language feature judgment; if the garbled character flag is true, the attachment structural entropy is assigned a preset high-risk threshold; if the garbled character flag is false, the number of nested levels of the compressed package, the macro presence identifier, and the obfuscation score are weighted and summed to obtain a weighted sum value, and the minimum value between the weighted sum value and 1 is taken as the attachment structural entropy.
[0042] Preferably, the attachment metadata can be obtained through MIME recursive parsing.
[0043] In one specific embodiment, the process of obtaining the attachment structure entropy involves: calling the aforementioned data preprocessing module to perform garbled character recognition on the text features of the attachment through character set analysis, encoding format detection, and natural language feature judgment. If the attachment is found to have an encoding obfuscation payload (i.e., the garbled character flag isGibberish == True is determined), then the attachment structure entropy is directly assigned. (High risk). Otherwise, the calculation formula is: Among them, Layers is the nesting level of the compressed package, HasMacro is a boolean value, and Obfuscation is the obfuscation score based on the filename or attribute.
[0044] It should be noted that multi-level nesting is a common tactic used by APT attackers to hide malicious code in order to evade gateway scanning.
[0045] In this embodiment, the Uniform Resource Locator includes a hostname field and a path field; wherein, based on the structured metadata, each multimodal entropy is calculated, including: concatenating the hostname field and the path field into a URL string, and calculating the Shannon entropy of the URL string; dividing the Shannon entropy by a preset URL entropy normalization factor to obtain a third intermediate entropy value, and taking the minimum value between the third intermediate entropy value and 1 as the URL feature entropy.
[0046] Preferably, the hostname and path portions of all URLs in the text can be extracted using regular expressions.
[0047] It should be noted that DGA domains are usually composed of random characters (such as xkz82q9.com) and have extremely high entropy values; while normal domains (such as service-login.com) have lower entropy values.
[0048] S102, perform garbled character recognition on the email to be tested to determine whether there is garbled character in the email to be tested; wherein, the garbled character recognition includes character set analysis, encoding format detection and natural language feature judgment; S103, if it is determined that there are no garbled characters in the email to be detected, then a prompt word is constructed through the multidimensional entropy vector, and the prompt word is input into the pre-trained LLM model so that the LLM model performs semantic reasoning on the prompt word, generates and outputs the APT attack intent detection result of the email to be detected.
[0049] In one specific embodiment, constructing prompt words through the multidimensional entropy vector involves inserting the multidimensional entropy vector into a preset Prompt template to convert the multidimensional entropy vector into natural language and obtain the prompt words.
[0050] In this embodiment, the LLM model performs semantic reasoning on the prompt words to generate and output the APT attack intent detection result of the email to be detected. Specifically, the prompt words are segmented into word units to obtain word unit sequences, and each word unit in the word unit sequence is mapped to a word unit feature vector to obtain a word unit vector matrix. The word unit sequence includes each word unit corresponding to each multimodal entropy in the multidimensional entropy value vector. Through a pre-trained attention mechanism, the correlation degree between each word unit feature vector in the word unit vector matrix is calculated to obtain a correlation degree matrix. Based on the correlation degree matrix, the word unit feature vectors in the word unit vector matrix are weighted and fused to obtain a fused feature vector. The fused feature vector is compared with a pre-trained network security knowledge graph to generate and output the APT attack intent detection result of the email to be detected.
[0051] In one specific embodiment, the LLM model performs semantic reasoning on the prompt words to generate and output the APT attack intent detection result of the email to be detected, including: (1) Input data tokenization and vector mapping. The engine inputs the constructed JSON format Prompt text into the word segmenter of the large language model, converts it into a discrete token sequence, and maps each token into a high-dimensional continuous feature vector through the model's embedding layer, thereby converting discrete text symbols into computer-computable matrix data.
[0052] (2) Deep mapping and attention-based feature weighting across modalities. In the multi-layer deep neural network of the large language model, the system utilizes a pre-trained attention mechanism to dynamically calculate the correlation matrix between different input token vectors. For example, the underlying tensor operations calculate the dot product of the feature vector representing "low-frequency behavior" and the feature vector representing "garbled payload" in the multi-dimensional feature space. Through forward propagation and weight allocation of the network model, the system quantifies and captures the "strong mutual exclusion" and "cooperative anomaly" of these different-dimensional input indicators in the network behavior logic at the level of the underlying mathematical matrix. (3) Knowledge representation alignment and intent prediction. The model compares the similarity of the above-fused attention context vector with the network security knowledge graph (such as the kill chain features of APT attack detection and delivery) contained in its pre-trained weights in a high-dimensional space, and calculates the conditional probability distribution of whether the current input feature combination belongs to "malicious intent" or "benign intent".
[0053] (4) Deterministic Decoding and Natural Language Output. At the output layer, the system forcibly sets the temperature parameter of the large language model to zero (temperature=0.0) and adopts a greedy decoding strategy, always selecting the word with the highest probability value for output. Based on this, the system generates a natural language analysis report containing the logical deduction process, as well as the final qualitative conclusion, eliminating the randomness in the generation process and ensuring the constancy of the security system's output.
[0054] It should be noted that the temperature parameter of LLM is set to 0.0. This forces the model to always select the token with the highest probability during decoding, eliminating the "illusion" risk of generative AI and ensuring that the judgment result for the same input remains constant.
[0055] In this embodiment, the method further includes: if it is determined that there is garbled text in the email to be detected, then identifying the start and end positions of the garbled text in the corresponding text, and extracting characters of a preset length forward and backward according to the start and end positions to obtain a context text block; constructing prompt words through the multidimensional entropy vector and the context text block, and inputting the prompt words into a pre-trained LLM model so that the LLM model performs semantic reasoning on the prompt words, generates and outputs the APT attack intent detection result of the email to be detected.
[0056] In one specific embodiment, for the detected garbled payload, the aforementioned data preprocessing module is invoked to automatically extract a pre-defined length of text (e.g., 500 characters before and after the garbled text) as a context snippet before the garbled text fragment. This context snippet is then dynamically injected into the aforementioned structured Prompt. After receiving the complete Prompt containing multi-dimensional fingerprint features and context text, the large language model performs joint association analysis of semantic features to infer the true purpose of the garbled text.
[0057] For example, the prompt word is as follows: { "role":"expert_system", "instruction":"Analyze the following email fingerprint to determine if it is an APT attack.", "features": { "header_entropy": 0.21, "sender_behavior":"LowFrequency, High Regularity (Entropy 0.1)", "payload":"Gibberish Text Detected (Base64-like)", "url_entropy": 0.88}} It should be noted that the line-by-line interpretation of the above Prompt structured features is as follows: "header_entropy": 0.21: This corresponds to the previously extracted "header construction entropy (Eh)". The value of 0.21 is in the low entropy range, indicating to the large model that the email header structure is standard, disguised as a legitimate regular email.
[0058] "sender_behavior":"Low Frequency, High Regularity (Entropy 0.1)": This corresponds to "sender behavior entropy (Es)". Inputting a textual description and a numerical value (0.1) into the large model indicates that the sender exhibits a mechanical characteristic of "extremely low sending frequency but extremely regular time intervals".
[0059] "payload": "Gibberish Text Detected (Base64-like)": This corresponds to "attachment / payload features," i.e., contextual text blocks. It indicates to the larger model that high-entropy garbled text similar to Base64 encoding has been detected in the main text or attachments, suggesting the possibility of malicious code obfuscation.
[0060] "url_entropy": 0.88: corresponds to "URL feature entropy (Eu)". A value of 0.88 is in the high entropy range, indicating to large models that the links embedded in the email are composed of a large number of random characters (such as DGA domains), which is high-risk.
[0061] This structured Prompt can effectively guide LLM to focus on key features and improve inference accuracy.
[0062] In one specific embodiment, the structured recognition and underlying data processing of "leading rhetoric" by the large language model are as follows: 1) Tensor mapping of semantic feature words: The large language model transforms the input context text block into a sequence of tokens and maps them to a high-dimensional semantic vector space. The underlying model assigns high-dimensional feature weights to specific semantic groups (such as words representing "urgency", words representing "profit inducement", and words representing "permission request") through pre-trained parameters.
[0063] 2) Distance calculation for intent classification: The model calculates the cosine similarity between the semantic vector of the current context text and the pre-defined standard semantic clusters for "social engineering attacks" in multi-dimensional space. When the similarity exceeds a preset threshold, the system determines at the computer level that the text fragment has triggered a high-risk probability distribution of "inducing intent".
[0064] 3) Cross-modal joint confidence output: Although the garbled portion cannot be directly determined due to the lack of decodeable plaintext, the attention mechanism of the large language model calculates the proximity weights between the "high-probability inductive intent vector" and the "high-entropy garbled position marker." If the two are structurally highly correlated, the model can derive a qualitative conclusion of "using inductive text to mask malicious obfuscation payload" through logical constraint rules. Thus, without needing to reverse-decode the encrypted garbled text, the model can directly determine that the email is malicious.
[0065] In this embodiment, the method further includes: if the APT attack intent detection result shows that there is an APT attack intent in the email to be detected, then determining whether each multimodal entropy in the multidimensional entropy vector satisfies a preset entropy combination condition; if it is determined that each multimodal entropy in the multidimensional entropy vector satisfies the preset entropy combination condition, then triggering an email interception mechanism; otherwise, generating a corresponding natural language parsing based on the multidimensional entropy vector, and outputting an alarm message containing the natural language parsing.
[0066] In one specific embodiment, if the AI analysis result determines the attack to be "Malicious" and meets the preset entropy value anomaly condition, an interception mechanism is triggered to ensure the accuracy and reliability of the decision. To accurately capture the core characteristics of APT attacks, the system focuses on identifying the abnormal combination pattern of "low-entropy behavior + high-entropy payload / link"—this pattern is a typical manifestation of APT attacks that "use regular behavior as a carrier to carry highly obfuscated malicious content." The specific judgment logic is as follows: This means "an extremely punctual bot sent a highly chaotic package," a typical fingerprint of an APT attack.
[0067] To better illustrate the working principle and implementation process of the embodiments of the present invention, please refer to... Figure 2 , Figure 2 This is a schematic diagram illustrating an APT attack detection process for email, as exemplified by an embodiment of the present invention.
[0068] This invention quantifies email features from different dimensions into comparable numerical values (entropy values), providing a unified metric for subsequent cross-modal logical conflict analysis. It identifies malicious payloads with obfuscated encodings within emails, providing a basis for determining subsequent processing path branches. By leveraging the semantic understanding capabilities of LLM (such as whether low behavioral inertia entropy + high URL feature entropy constitute an anomalous combination), it outputs interpretable judgment results. Compared to existing technologies, this invention solves the technical problem of low APT attack detection accuracy caused by isolated analysis of multimodal features, which fails to effectively identify cross-modal logical conflicts.
[0069] Example 2: like Figure 3 As shown in the figure, an APT attack detection system for email provided by an embodiment of the present invention includes a multi-dimensional entropy calculation module 201, an email garbled character recognition module 202, and an APT attack detection module 203, wherein... The multidimensional entropy calculation module 201 is used to acquire emails to be detected with attachments, and to perform multimodal entropy calculations on the emails to be detected to obtain multidimensional entropy vectors; wherein, the multimodal entropy includes header construction entropy, behavioral inertia entropy, attachment structure entropy and URL feature entropy; In this embodiment, the multi-dimensional entropy calculation module 201 performs multi-modal entropy calculations on the email to be detected to obtain a multi-dimensional entropy vector. Specifically, the multi-dimensional entropy calculation module 201 performs protocol parsing on the email to be detected to extract structured metadata. The structured metadata includes header fields, attachment metadata, Uniform Resource Locator (URL), sender identifier, and current sending timestamp. Based on the structured metadata, each multi-modal entropy is calculated, and the multi-modal entropies are integrated to obtain a multi-dimensional entropy vector.
[0070] In this embodiment, the header fields include a sending client field, an email subject field, and an email ID field. The multi-dimensional entropy calculation module 201 calculates each multimodal entropy based on the structured metadata, including: concatenating the sending client field, email subject field, and email ID field into a combined string according to a preset order, and calculating the Shannon entropy of the combined string; dividing the Shannon entropy by a preset header entropy normalization factor to obtain a first intermediate entropy value, and taking the minimum value between the first intermediate entropy value and 1 as the header construction entropy.
[0071] In this embodiment, the multidimensional entropy calculation module 201 calculates each multimodal entropy based on the structured metadata, including: querying the historical sending timestamp sequence of the corresponding sender within a preset historical time window based on the sender identifier, and counting the total number of sending times within the historical time window based on the historical sending timestamp sequence; calculating the time interval sequence of adjacent sending behaviors based on the current sending timestamp and the historical sending timestamp sequence, and calculating the standard deviation or variance of the time interval sequence based on the total number of sending times and the time interval sequence to obtain the sending time discrete value; calculating the average sending interval based on the length of the preset historical time window and the total number of sending times; calculating the product of the sending time discrete value, the average sending interval, and the preset smoothing adjustment parameter to obtain the second intermediate entropy value, and normalizing the second intermediate entropy value through the hyperbolic tangent activation function to obtain the behavioral inertia entropy.
[0072] In this embodiment, the attachment metadata includes attachment text content, the number of nested levels of the compressed package, macro existence identifiers, and confusion scores. The multidimensional entropy calculation module 201 calculates each multimodal entropy based on the structured metadata, including: the multidimensional entropy calculation module 201 performs garbled character recognition on the attachment text content in the attachment metadata to obtain a garbled character flag; the garbled character recognition includes character set analysis, encoding format detection, and natural language feature judgment; if the garbled character flag is true, the attachment structural entropy is assigned a preset high-risk threshold; if the garbled character flag is false, the number of nested levels of the compressed package, the macro existence identifier, and the confusion score are weighted and summed to obtain a weighted sum value, and the minimum value between the weighted sum value and 1 is taken as the attachment structural entropy.
[0073] In this embodiment, the Uniform Resource Locator includes a hostname field and a path field; wherein, the multidimensional entropy calculation module 201 calculates each multimodal entropy according to the structured metadata, including: the multidimensional entropy calculation module 201 concatenates the hostname field and the path field into a URL string, and calculates the Shannon entropy of the URL string; divides the Shannon entropy by a preset URL entropy normalization factor to obtain a third intermediate entropy value, and takes the minimum value between the third intermediate entropy value and 1 as the URL feature entropy.
[0074] The email garbled character recognition module 202 is used to perform garbled character recognition on the email to be detected in order to determine whether there is garbled character in the email to be detected; wherein, the garbled character recognition includes character set analysis, encoding format detection and natural language feature judgment; The APT attack detection module 203 is used to construct a prompt word through the multidimensional entropy vector if it is determined that there is no garbled text in the email to be detected, and input the prompt word into a pre-trained LLM model so that the LLM model can perform semantic reasoning on the prompt word, generate and output the APT attack intent detection result of the email to be detected.
[0075] In this embodiment, the LLM model performs semantic reasoning on the prompt words to generate and output the APT attack intent detection result of the email to be detected. Specifically, the APT attack detection module 203 performs word segmentation on the prompt words to obtain a word sequence, and maps each word in the word sequence to a word feature vector to obtain a word vector matrix. The word sequence includes each word corresponding to each multimodal entropy in the multidimensional entropy vector. Through a pre-trained attention mechanism, the correlation degree between each word feature vector in the word vector matrix is calculated to obtain a correlation degree matrix. Based on the correlation degree matrix, the word feature vectors in the word vector matrix are weighted and fused to obtain a fused feature vector. The fused feature vector is compared with a pre-trained network security knowledge graph to generate and output the APT attack intent detection result of the email to be detected.
[0076] In this embodiment, the method further includes: if the APT attack intent detection result shows that there is an APT attack intent in the email to be detected, then determining whether each multimodal entropy in the multidimensional entropy vector satisfies a preset entropy combination condition; if it is determined that each multimodal entropy in the multidimensional entropy vector satisfies the preset entropy combination condition, then triggering an email interception mechanism; otherwise, generating a corresponding natural language parsing based on the multidimensional entropy vector, and outputting an alarm message containing the natural language parsing.
[0077] For a more detailed explanation of the working principle and procedures of this embodiment, please refer to the relevant description in Embodiment 1.
[0078] This invention utilizes a multi-dimensional entropy calculation module 201 to quantify email features of different dimensions into comparable numerical values (entropy values), providing a unified measurement basis for subsequent cross-modal logical conflict analysis; an email garbled character recognition module 202 identifies whether there are malicious payloads with encoded obfuscation in the email, providing a basis for determining subsequent processing path branches; and an APT attack detection module 203 uses the semantic understanding capability of LLM to analyze the logical relationships between various entropy values (such as whether low behavioral inertia entropy + high URL feature entropy constitute an abnormal combination), outputting interpretable judgment results.
[0079] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.
[0080] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above descriptions are merely specific embodiments of the present invention and are not intended to limit the scope of protection of the present invention. In particular, it should be noted that any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention for those skilled in the art.
Claims
1. A method for detecting APT attacks targeting email, characterized in that, include: The system acquires emails with attachments to be inspected, and performs multimodal entropy calculations on each email to obtain a multidimensional entropy vector. The multimodal entropy includes header construction entropy, behavioral inertia entropy, attachment structure entropy, and URL feature entropy. The email to be tested is subjected to garbled character recognition to determine whether there is garbled character in the email; wherein, the garbled character recognition includes character set analysis, encoding format detection and natural language feature judgment; If it is determined that there are no garbled characters in the email to be detected, then a prompt word is constructed using the multidimensional entropy vector, and the prompt word is input into a pre-trained LLM model so that the LLM model performs semantic reasoning on the prompt word, generates and outputs the APT attack intent detection result of the email to be detected.
2. The APT attack detection method for email as described in claim 1, characterized in that, Based on the emails to be detected, multimodal entropy is calculated to obtain a multidimensional entropy vector, specifically: The email to be inspected is parsed according to the protocol to extract structured metadata; wherein, the structured metadata includes header fields, attachment metadata, Uniform Resource Locator, sender identifier and current sending timestamp; Based on the structured metadata, each multimodal entropy is calculated separately, and the multimodal entropies are integrated to obtain a multidimensional entropy vector.
3. The APT attack detection method for email as described in claim 2, characterized in that, The header fields include the sending client field, the email subject field, and the email ID field; Specifically, based on the structured metadata, the multimodal entropy is calculated for each mode, including: The sending client field, email subject field, and email ID field are concatenated into a combined string according to a preset order, and the Shannon entropy of the combined string is calculated. Divide the Shannon entropy by a preset header entropy normalization factor to obtain the first intermediate entropy value, and take the minimum value between the first intermediate entropy value and 1 as the header construction entropy.
4. The APT attack detection method for email as described in claim 2, characterized in that, Based on the structured metadata, calculate the entropy of each multimodal mode, including: Based on the sender identifier, query the historical sending timestamp sequence of the corresponding sender within the preset historical time window, and based on the historical sending timestamp sequence, count the total number of sending times within the historical time window; Based on the current sending timestamp and the historical sending timestamp sequence, calculate the time interval sequence of adjacent sending behaviors, and based on the total number of sendings and the time interval sequence, calculate the standard deviation or variance of the time interval sequence to obtain the discrete value of sending time. The average sending interval is calculated based on the length of the preset historical time window and the total number of sending messages. The product of the discrete value of sending time, the average sending interval, and the preset smoothing adjustment parameter is calculated to obtain the second intermediate entropy value. The second intermediate entropy value is then normalized using the hyperbolic tangent activation function to obtain the behavioral inertia entropy.
5. The APT attack detection method for email as described in claim 2, characterized in that, The attachment metadata includes the attachment text content, the number of nested levels of the compressed package, macro existence identifiers, and obfuscation scores; Specifically, based on the structured metadata, the multimodal entropy is calculated for each mode, including: The attachment text content in the attachment metadata is subjected to garbled character recognition to obtain a garbled character flag; the garbled character recognition includes character set analysis, encoding format detection, and natural language feature judgment. If the garbled character flag is true, then the appendix structure entropy is assigned a preset high-risk threshold. If the garbled character flag is false, then the nesting level of the compressed package, the macro existence flag, and the obfuscation score are weighted and summed to obtain a weighted sum value, and the minimum value between the weighted sum value and 1 is taken as the attachment structure entropy.
6. The APT attack detection method for email as described in claim 2, characterized in that, The Uniform Resource Locator includes a hostname field and a path field; Specifically, based on the structured metadata, the multimodal entropy is calculated for each mode, including: The hostname field and the path field are concatenated into a URL string, and the Shannon entropy of the URL string is calculated. Divide the Shannon entropy by a preset URL entropy normalization factor to obtain a third intermediate entropy value, and take the minimum value between the third intermediate entropy value and 1 as the URL feature entropy.
7. The APT attack detection method for email as described in claim 2, characterized in that, The LLM model performs semantic reasoning on the prompt words, generates and outputs the APT attack intent detection result of the email to be detected, specifically: The prompt word is segmented into word units to obtain a word unit sequence, and each word unit in the word unit sequence is mapped to a word unit feature vector to obtain a word unit vector matrix; wherein, the word unit sequence includes word units corresponding to each multimodal entropy in the multidimensional entropy value vector; By using a pre-trained attention mechanism, the correlation degree between each word feature vector in the word vector matrix is calculated to obtain a correlation degree matrix. Based on the correlation degree matrix, the word feature vectors in the word vector matrix are weighted and fused to obtain a fused feature vector. The fused feature vector is compared with a pre-trained cybersecurity knowledge graph to generate and output the APT attack intent detection result of the email to be detected.
8. The APT attack detection method for email as described in claim 2, characterized in that, Also includes: If it is determined that there are garbled characters in the email to be detected, the start and end positions of the garbled characters in the corresponding text are identified, and characters of a preset length are truncated forward and backward according to the start and end positions to obtain the context text block; The prompt words are constructed using the multidimensional entropy vector and the context text block, and then input into a pre-trained LLM model so that the LLM model can perform semantic reasoning on the prompt words to generate and output the APT attack intent detection result of the email to be detected.
9. The APT attack detection method for email as described in claim 8, characterized in that, Also includes: If the APT attack intent detection result shows that there is an APT attack intent in the email to be detected, then determine whether each multimodal entropy in the multidimensional entropy vector satisfies the preset entropy combination condition. If it is determined that each multimodal entropy in the multidimensional entropy vector satisfies the preset entropy combination condition, then the email interception mechanism is triggered; otherwise, a corresponding natural language parsing is generated based on the multidimensional entropy vector, and an alarm message containing the natural language parsing is output.
10. An APT attack detection system for email, characterized in that, It includes a multi-dimensional entropy calculation module, an email garbled character recognition module, and an APT attack detection module, among which... The multidimensional entropy calculation module is used to acquire emails to be detected with attachments, and to perform multimodal entropy calculations on the emails to be detected to obtain multidimensional entropy vectors; wherein, the multimodal entropy includes header construction entropy, behavioral inertia entropy, attachment structure entropy and URL feature entropy; The email garbled character recognition module is used to perform garbled character recognition on the email to be detected in order to determine whether there is garbled character in the email to be detected; wherein, the garbled character recognition includes character set analysis, encoding format detection and natural language feature judgment; The APT attack detection module is used to construct a prompt word through the multidimensional entropy vector if it is determined that there is no garbled text in the email to be detected, and input the prompt word into a pre-trained LLM model so that the LLM model can perform semantic reasoning on the prompt word, generate and output the APT attack intent detection result of the email to be detected.