A communication object anomaly detection method, device and equipment

By dividing the communication object anomaly detection into sub-time periods, obtaining target features, and establishing an anomaly detection model, the problems of low detection accuracy and high false alarm rate in existing technologies are solved, and more efficient abnormal behavior recognition is achieved.

CN115879045BActive Publication Date: 2026-06-16NSFOCUS INFORMATION TECHNOLOGY CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NSFOCUS INFORMATION TECHNOLOGY CO LTD
Filing Date
2022-11-25
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing methods for detecting anomalies in communication objects assume that the data follows a Gaussian distribution, resulting in low detection accuracy and high false alarm rate, making it impossible to effectively identify abnormal employee behavior.

Method used

By dividing the set duration into multiple sub-time periods, the target time range characteristics, location information, and communication characteristics of the target sender and receiver are obtained. An anomaly detection model for communication objects is established using a preset vectorization algorithm and density estimation algorithm to perform anomaly detection.

🎯Benefits of technology

It improves the accuracy of anomaly detection of communication objects, reduces the false alarm rate, and can more accurately identify abnormal employee behavior.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115879045B_ABST
    Figure CN115879045B_ABST
Patent Text Reader

Abstract

The present disclosure relates to a communication object anomaly detection method, device and equipment, the method comprising: acquiring a plurality of to-be-detected email logs within a set time length, the set time length comprising a plurality of sub-time periods; analyzing the plurality of to-be-detected email logs respectively to acquire respective corresponding target sending objects and target receiving objects; for each group of target communication objects, determining a respective corresponding target time range feature based on the sub-time period to which the email communication time in each to-be-detected email log belongs; each group of target communication objects comprising one target sending object and one target receiving object; and performing anomaly detection on the target communication objects of the plurality of to-be-detected email logs according to the groups of target communication objects, and the target time range features, target location information and target communication features corresponding thereto. The present disclosure can improve the accuracy of communication object anomaly detection and reduce the false positive rate.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of network security technology, and in particular to a method, apparatus and device for detecting anomalies in communication objects. Background Technology

[0002] With the development of internet technology, email has become an indispensable office software for businesses. However, while email brings convenience, it also brings security issues. Besides malicious attacks from external criminals, abnormal behavior by internal employees can also cause losses to the company. For example, some employees may leak company secrets or send confidential internal information to external parties via email for various reasons.

[0003] Currently, employee email activity is typically monitored using traditional UEBA (User and Entity Behavior Analytics) methods with fixed rules. Taking a month as an example, a baseline is generated based on all historical email logs from the previous month. The logic for baseline formation is to statistically generate baselines for each group of senders and recipients based on the sender and recipient information in each historical email log. For a user's data to be monitored within a certain time period, the corresponding baseline is determined based on the sender and recipient information in that data. The data volume within that time period is compared with the corresponding indicators on that baseline to determine if the user exhibits abnormal behavior. For example, the mean and standard deviation of the user and recipient data are calculated using historical email logs. Using the 3-sigma principle (a rule of thumb), with a 5-minute monitoring range, if the difference between the user's data volume within this time range and the baseline mean exceeds three times the standard deviation, it is considered abnormal behavior.

[0004] The above method assumes that the data follows a Gaussian distribution and calculates the parameters of the Gaussian distribution, which is a parameter estimation method. However, in actual business scenarios, the data distribution may not follow a Gaussian distribution. Therefore, using the idea of ​​Gaussian distribution to detect anomalies in communication objects has a low accuracy rate and a high false alarm rate. Summary of the Invention

[0005] This disclosure provides a method, apparatus, and device for detecting anomalies in communication objects, which improves the accuracy of anomaly detection and reduces the false alarm rate.

[0006] According to a first aspect of the present disclosure, a method for detecting anomalies in communication objects is provided, the method comprising:

[0007] Obtain multiple email logs to be inspected within a set time period, wherein the set time period includes multiple sub-time periods;

[0008] The multiple email logs to be detected are analyzed separately to obtain the target sender and target receiver for each.

[0009] For each group of target communication objects, the corresponding target time range features are determined based on the sub-time period to which the email communication time in each of the corresponding email logs to be detected belongs; each group of target communication objects includes a target sender object and a target receiver object.

[0010] Based on each group of target communication objects, and their respective target time range characteristics, target location information, and target communication characteristics, anomaly detection is performed on the target communication objects of the multiple email logs to be detected; wherein, each target location information is obtained by analyzing the address information in the corresponding email log to be detected, and each target communication characteristic represents the sending status in the corresponding email log to be detected.

[0011] In one possible implementation, the step of performing anomaly detection on the target communication objects of the multiple email logs to be detected based on the groups of target communication objects and the corresponding target time range characteristics, target location information, and target communication characteristics of each group includes:

[0012] Based on each group of target communication objects, and their respective target time range characteristics, target location information, and target communication characteristics, the corresponding target characteristics are obtained.

[0013] Each target feature obtained is input into the corresponding communication object anomaly detection model to obtain the actual output of each communication object anomaly model.

[0014] Based on the relationship between the distance from each actual result to the corresponding median and the set threshold, the detection results of the communication objects of the multiple email logs to be detected are determined and displayed. Each median is determined based on the mean of each function value in the corresponding communication object anomaly model.

[0015] In one possible implementation, the communication object anomaly detection model is obtained in the following way:

[0016] Retrieve multiple historical email logs within the set time period;

[0017] Analyze each of the multiple historical email logs to obtain the corresponding sender and receiver;

[0018] For each group of communication objects, the corresponding time range characteristics are determined based on the sub-time period to which the email communication time in each historical email log belongs; each group of communication objects includes a sender object and a receiver object.

[0019] Based on each group of communication objects and their respective time range features, location information, and communication features, an anomaly detection model for each communication object is obtained; wherein, each piece of location information is obtained by analyzing the address information in the corresponding historical email log, and each piece of communication feature represents the sending status in the corresponding historical email log.

[0020] In one possible implementation, determining the corresponding time range characteristics for each group of communication objects based on the sub-time period to which the email communication time in each corresponding historical email log belongs includes:

[0021] For each group of communication objects, the time vector of each historical email log corresponding to each group of communication objects is determined based on the sub-time period to which the email communication time in each historical email log belongs.

[0022] The sum of the time vectors of each historical email log corresponding to each group of communication objects is used as the initial time range feature for each group.

[0023] Based on the number of historical email logs corresponding to each group of communication objects, the corresponding initial time range features are normalized to obtain their respective time range features.

[0024] In one possible implementation, obtaining the corresponding communication object anomaly detection model based on each group of communication objects and their respective time range features, location information, and communication features includes:

[0025] The communication objects and their corresponding location information are vectorized using a preset vectorization algorithm to obtain their respective first features;

[0026] The first feature, the time range feature, and the communication feature corresponding to each other are concatenated to obtain the second feature corresponding to each other.

[0027] Based on their respective second features, pre-defined density estimation algorithms are used to model them, resulting in corresponding communication object anomaly detection models.

[0028] In one possible implementation, the step of characterizing each group of communication objects and their corresponding location information using a preset featureization algorithm to obtain their respective first features includes:

[0029] Each group of communication objects and their corresponding location information are input into the BERT language representation model to obtain the corresponding first feature output by the BERT.

[0030] In one possible implementation, determining the corresponding target time range features for each group of target communication objects based on the sub-time period to which the email communication time in each corresponding email log to be detected belongs includes:

[0031] For each group of target communication objects, the time vector of each email log to be detected is determined based on the sub-time period to which the email communication time in each corresponding email log to be detected belongs;

[0032] The sum of the time vectors of each email log to be detected corresponding to each group of target communication objects is used as the initial target time range feature for each group.

[0033] Based on the number of email logs to be detected corresponding to each group of target communication objects, the corresponding initial target time range features are normalized to obtain their respective target time range features.

[0034] In one possible implementation, obtaining the corresponding target features based on each group of target communication objects and their respective target time range features, target location information, and target communication features includes:

[0035] The target communication objects and their corresponding target location information are vectorized using a preset vectorization algorithm to obtain their respective third features;

[0036] The corresponding third feature, the target time range feature, and the target communication feature are concatenated to obtain their respective target features.

[0037] In one possible implementation, the step of vectorizing each group of target communication objects and their corresponding target location information using a preset vectorization algorithm to obtain their respective third features includes:

[0038] Each group of target communication objects and their corresponding target location information are input into the BERT to obtain the corresponding third feature output by the BERT.

[0039] In one possible implementation, the method further includes:

[0040] For any set of target communication objects, if no corresponding communication object anomaly detection model exists for the target communication object, the corresponding actual result is determined in the following manner:

[0041] The target features corresponding to the target communication object are input into at least one communication object anomaly detection model corresponding to the target sender object in the target communication object, and the actual results output by each of the at least one communication object anomaly model are obtained.

[0042] In one possible implementation, for the target communication object, determining and displaying the detection results of the communication objects for the multiple email logs to be detected based on the relationship between the distances of the actual results to the corresponding median lines and a set threshold includes:

[0043] If the distance from at least one of the actual results to the corresponding centerline does not exceed the threshold, then the target recipient is determined to be normal.

[0044] If the distance from at least one of the actual results to the corresponding midline exceeds the threshold, then the target recipient is determined to be abnormal.

[0045] In one possible implementation, determining and displaying the detection results of the communication objects of the multiple email logs to be detected based on the relationship between the distances of the actual results to the corresponding median lines and a set threshold includes:

[0046] For each actual result, perform the following operations:

[0047] For a given actual result, if the distance from the actual result to the corresponding centerline does not exceed the threshold, then the corresponding target communication object is determined to be normal.

[0048] If the distance from an actual result to the corresponding midline exceeds the threshold, then the corresponding target communication object is determined to have abnormal behavior.

[0049] In one possible implementation, determining that the corresponding target communication object exhibits abnormal behavior includes:

[0050] Based on the multiple email logs to be detected, obtain the source address information corresponding to the target sender and the destination address information corresponding to the target receiver;

[0051] If the target sender and its corresponding source address information are not in the pre-built whitelist, but the target receiver and its corresponding destination address information are in the whitelist, then the target sender is determined to be abnormal.

[0052] If the target sender and its corresponding source address are in the whitelist, and the target receiver and its corresponding destination address are not in the whitelist, then the target receiver is determined to be abnormal.

[0053] If the target sender and its corresponding source address are not in the whitelist, and the target receiver and its corresponding destination address are not in the whitelist, then the target communication object is determined to be abnormal.

[0054] In one possible implementation, the whitelist is constructed in the following manner:

[0055] Each historical email log within the set time period is analyzed to obtain its corresponding source address information, destination address information, sender and recipient information;

[0056] If the sender and receiver are determined to be normal communication objects, then the sender is associated with the source address information and stored in the whitelist, and the receiver is associated with the destination address information and stored in the whitelist.

[0057] In one possible implementation, determining that the corresponding target communication object exhibits abnormal behavior includes:

[0058] Based on the multiple email logs to be detected, a first number of source address information corresponding to the target sender and a second number of destination address information corresponding to the target receiver are determined.

[0059] If the first quantity exceeds the set first threshold, it is determined that the target sender object corresponds to multiple abnormal source address information.

[0060] If the second quantity exceeds the set second threshold, it is determined that the target recipient object corresponds to multiple abnormal destination address information.

[0061] If the first quantity exceeds the first threshold and the second quantity exceeds the second threshold, then it is determined that the multiple source address information corresponding to the target sender object and the multiple destination address information corresponding to the target receiver object are abnormal.

[0062] In one possible implementation, determining that the corresponding target communication object exhibits abnormal behavior includes:

[0063] Based on the multiple email logs to be detected, a third number of target sender objects corresponding to the same source address information and a fourth number of target receiver objects corresponding to the same target address information are determined.

[0064] If the third quantity exceeds the set third threshold, it is determined that multiple target sender objects corresponding to the same source address information are abnormal.

[0065] If the fourth quantity exceeds the set fourth threshold, it is determined that multiple target recipients corresponding to the same destination address information are abnormal.

[0066] If the third quantity exceeds the third threshold and the fourth quantity exceeds the fourth threshold, then it is determined that multiple target sender objects corresponding to the same source address information and multiple target receiver objects corresponding to the same destination address information are abnormal.

[0067] In one possible implementation, determining that the corresponding target communication object exhibits abnormal behavior includes:

[0068] If the number of email logs to be detected corresponding to the target communication object exceeds the set fifth threshold, then the target communication object is determined to be in an excessive communication abnormality.

[0069] According to a second aspect of the present disclosure, a communication object anomaly detection device is provided, the device comprising:

[0070] The acquisition module is used to acquire multiple email logs to be detected within a set time period, wherein the time period includes multiple sub-time periods;

[0071] The analysis module is used to analyze the multiple email logs to be detected, and obtain the target sender and target receiver for each.

[0072] The determination module is used to determine the target time range characteristics of each group of target communication objects based on the sub-time period to which the email communication time in each corresponding email log to be detected belongs; each group of target communication objects includes a target sender object and a target receiver object.

[0073] The detection module is used to perform anomaly detection on the target communication objects of the multiple email logs to be detected based on the target communication objects of each group, and their respective target time range characteristics, target location information and target communication characteristics; wherein, each target location information is obtained by analyzing the address information in the corresponding email log to be detected, and each target communication characteristic represents the sending status in the corresponding email log to be detected.

[0074] According to a third aspect of the present disclosure, an electronic device is provided, comprising: a processor; and a memory for storing processor-executable instructions; wherein the processor implements the steps of the above-described communication object anomaly detection method by executing the executable instructions.

[0075] According to a fourth aspect of the present disclosure, a computer-readable storage medium is provided that stores computer instructions thereon, which, when executed by a processor, implement the steps of the above-described communication object anomaly detection method.

[0076] The technical solutions provided by the embodiments of this disclosure have at least the following beneficial effects:

[0077] To improve the accuracy of communication object anomaly detection, this disclosure provides a target time range feature for communication object anomaly detection. This disclosure divides a set time period into multiple sub-time periods. For each group of target communication objects, the corresponding target time range feature is determined based on the sub-time period to which the email communication time in each email log to be detected belongs within the corresponding set time period. Furthermore, this disclosure performs anomaly detection on target communication objects in multiple email logs to be detected based on multiple features, namely, target communication object, target time range feature, target location information, and target communication features, thereby improving the accuracy of communication object anomaly detection and reducing the false positive rate. Attached Figure Description

[0078] To more clearly illustrate the technical solutions in the embodiments of this disclosure, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0079] Figure 1 This is a schematic diagram illustrating an application scenario according to an exemplary embodiment;

[0080] Figure 2 This is a flowchart illustrating a communication object anomaly detection method according to an exemplary embodiment;

[0081] Figure 3 This is a flowchart illustrating a method for detecting anomalies in a communication object according to an exemplary embodiment;

[0082] Figure 4 This is a flowchart illustrating a method for obtaining characteristics of a target time range according to an exemplary embodiment;

[0083] Figure 5 This is a flowchart illustrating a method for obtaining target features according to an exemplary embodiment;

[0084] Figure 6 This is a flowchart illustrating a method for establishing an anomaly detection model for a communication object according to an exemplary embodiment;

[0085] Figure 7 This is a flowchart illustrating a method for establishing an anomaly detection model for a communication object according to an exemplary embodiment;

[0086] Figure 8 This is a schematic diagram illustrating an anomaly detection model for communication objects according to an exemplary embodiment;

[0087] Figure 9This is a schematic diagram of a communication object anomaly detection device according to an exemplary embodiment;

[0088] Figure 10 This is a schematic diagram of an electronic device illustrating a method for detecting anomalies in a communication object according to an exemplary embodiment;

[0089] Figure 11 This is a schematic diagram of a program product illustrating a communication object anomaly detection method according to an exemplary embodiment. Detailed Implementation

[0090] To make the objectives, technical solutions, and advantages of this disclosure clearer, the disclosure will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this disclosure, and not all embodiments. Based on the embodiments of this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the protection scope of this disclosure.

[0091] The following are explanations of some of the words that appear in the text:

[0092] 1. In the embodiments of this disclosure, the term "and / or" describes the relationship between associated objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. The character " / " generally indicates that the preceding and following associated objects have an "or" relationship.

[0093] 2. The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this disclosure described herein can be implemented in orders other than those illustrated or described herein.

[0094] The application scenarios described in this disclosure are for the purpose of more clearly illustrating the technical solutions of this disclosure and do not constitute a limitation on the technical solutions provided in this disclosure. Those skilled in the art will understand that with the emergence of new application scenarios, the technical solutions provided in this disclosure are also applicable to similar technical problems. In the description of this disclosure, unless otherwise stated, "multiple" means two or more.

[0095] Currently, employee email activity is typically monitored using traditional UEBA methods with fixed rules. Taking a month as an example, a baseline is generated based on all historical email logs from the previous month. The logic for baseline formation is to statistically generate a baseline based on the historical email logs of any sender and any recipient. For a user's data to be monitored within a certain time period, the corresponding baseline is determined based on the sender and recipient data. The data volume within that time period is compared with the corresponding indicators on the baseline to determine if the user exhibits abnormal behavior. For example, the mean and standard deviation of the user and recipient data are calculated using historical email logs. Using the 3-sigma principle, with a 5-minute monitoring range, if the difference between the user's data volume within this time range and the baseline mean exceeds three times the standard deviation, it is considered abnormal behavior. The 3-sigma principle is used for rapid extrapolation of normally distributed data with known means and standard deviations.

[0096] The above method assumes that the data follows a Gaussian distribution and calculates the parameters of the Gaussian distribution, which is a parameter estimation method. However, in actual business scenarios, the data distribution may not follow a Gaussian distribution. Therefore, using the idea of ​​Gaussian distribution to detect abnormal email accounts has a low accuracy rate and a high false positive rate.

[0097] Therefore, in order to solve the above problems, this disclosure provides a method, apparatus and device for detecting communication object anomalies, which improves the accuracy of communication object anomaly detection and reduces the false alarm rate.

[0098] First refer to Figure 1 This is a schematic diagram of an application scenario of an embodiment of the present disclosure, including a terminal device 11, a mail server 12, and a server 13. The terminal device 11 can be a portable computer, personal computer, mobile phone, etc., used to send emails to other terminal devices. The mail server 12 is connected to one or more terminal devices 11 and is used to collect email logs to be detected. The server 13 is used to obtain the email logs to be detected from the mail server 12 and perform communication object anomaly detection on the email logs to be detected.

[0099] In this embodiment, server 13 obtains multiple email logs to be detected within a set time period from mail server 12, wherein the time period includes multiple sub-time periods; analyzes each of the multiple email logs to be detected to obtain the corresponding target sender and target receiver; for each group of target communication objects, determines the corresponding target time range feature based on the sub-time period to which the email communication time in each of the corresponding email logs to be detected belongs; each group of target communication objects includes a target sender and a target receiver; based on each group of target communication objects and their corresponding target time range feature, target location information, and target communication feature, anomaly detection is performed on the target communication objects of the multiple email logs to be detected; wherein each target location information is obtained by analyzing the address information in the corresponding email log to be detected, and each target communication feature represents the sending status in the corresponding email log to be detected.

[0100] In this disclosure, a method for detecting anomalies in communication objects is provided. Based on the same concept, this disclosure also provides a device for detecting anomalies in communication objects, an electronic device, and a computer-readable storage medium.

[0101] In some embodiments, the following describes a communication object anomaly detection method provided in this disclosure through specific examples, such as... Figure 2 As shown, it includes:

[0102] Step 201: Obtain multiple email logs to be detected within a set time period, wherein the set time period includes multiple sub-time periods;

[0103] The above-mentioned duration can be one month or other values. If the duration is set to one month, it is divided into four sub-periods: week one, week two, week three, and week four (including periods exceeding week four). The above-mentioned multiple email logs to be checked can be obtained from the mail server.

[0104] It should be noted that the above-mentioned methods for setting duration and sub-time periods are merely illustrative examples. Any setting method is applicable to the embodiments disclosed herein, and no specific limitations are imposed.

[0105] Step 202: Analyze the multiple email logs to be detected to obtain the target sender and target receiver for each.

[0106] The email logs to be inspected may include information such as source address (sip), destination address (dip), source port (sport), destination port (dport), target sender (mail_sender), target receiver (mail_receiver), and timestamp. By analyzing the email logs to be inspected, the corresponding target sender and target receiver can be obtained.

[0107] In this embodiment of the disclosure, the sender object can represent the account that sends the email, such as the sender's email address. Similarly, the recipient object can represent the account that receives the email, such as the recipient's email address.

[0108] Step 203: For each group of target communication objects, determine the corresponding target time range features based on the sub-time period to which the email communication time in each of the corresponding email logs to be detected belongs. Each group of target communication objects includes a target sender object and a target receiver object.

[0109] The email communication time in the above-mentioned email log to be detected can be email sending time, email receiving time, email processing time, etc.

[0110] Step 204: Based on the target communication objects of each group, and their respective target time range characteristics, target location information and target communication characteristics, perform anomaly detection on the target communication objects of the multiple email logs to be detected.

[0111] Each of the target location information is obtained by analyzing the address information in the corresponding email log to be detected, and each of the target communication features represents the sending status in the corresponding email log to be detected.

[0112] Each of the above target communication features can be judged as successful based on the protocol status code in the email log to be detected. Similar to the HTTP (Hypertext Transfer Protocol) protocol, 200 indicates success and 400 indicates client request failure. However, there are many status codes, so further processing is required to convert them into a binary classification feature, such as 1 indicating successful communication and 0 indicating communication failure.

[0113] Each of the aforementioned target location information is obtained by analyzing the source address, destination address, source port, and destination port information in the email logs to be inspected. For example, by analyzing the address information in the email logs to be inspected, the target location information is determined to be "City B, Province A, China".

[0114] This disclosure improves the accuracy of communication object anomaly detection and reduces the false alarm rate by using each group of target communication objects and their corresponding target time range characteristics, target location information and target communication characteristics to perform anomaly detection on the target communication objects of the multiple email logs to be detected.

[0115] The specific steps of the communication object anomaly detection method provided in this disclosure are described below, such as... Figure 3 As shown, it includes:

[0116] Step 301: Obtain multiple email logs to be detected within a set time period, wherein the set time period includes multiple sub-time periods;

[0117] The above-mentioned duration can be set according to the actual application. It is necessary to divide the duration evenly into multiple sub-time periods. For example, if the duration is set to 20 days, then days 1-5 are the first sub-time period, days 6-10 are the second sub-time period, days 11-15 are the third sub-time period, and days 16-20 are the fourth sub-time period.

[0118] Step 302: Analyze the multiple email logs to be detected to obtain the target sender and target receiver for each.

[0119] The target sender is the sender's email address, such as test_sender@163.com, and the target recipient is the recipient's email address, such as test_receiver@163.com.

[0120] Step 303: For each group of target communication objects, determine the corresponding target time range features based on the sub-time period to which the email communication time in each of the corresponding email logs to be detected belongs. Each group of target communication objects includes a target sender object and a target receiver object.

[0121] like Figure 4 As shown, the specific steps for determining the corresponding target time range features include:

[0122] Step 401: For each group of target communication objects, determine the time vector of each email log to be detected corresponding to each group of target communication objects based on the sub-time period to which the email communication time in each corresponding email log to be detected belongs.

[0123] The method described above for determining the time vector of each email log to be detected uses the bag-of-words model and one-hot encoding (one effective bit encoding). For example, if the duration is set to one month, this duration includes four sub-periods: the first week, the second week, the third week, and the fourth week (including time beyond the fourth week). There are a total of four email logs to be detected from the communication between the target communication objects. If the communication time of the first email log to be detected is within the first week, its corresponding time vector is [1,0,0,0]; if the communication time of the second email log to be detected is within the second week, its corresponding time vector is [0,1,0,0]; if the communication time of the third email log to be detected is within the third week, its corresponding time vector is [0,0,1,0]; and if the communication time of the fourth email log to be detected is within the fourth week, its corresponding time vector is [0,0,0,1]. For example, in the email logs to be checked between the target sender object test_sender@163.com and the target receiver object test_receiver@163.com, one of the email logs has a timestamp of 1656656420000, and the parsed email communication time is 2022-08-01, which is the first week of August. Then the time vector corresponding to this email log is [1,0,0,0].

[0124] Step 402: The sum of the time vectors of each email log to be detected corresponding to each group of target communication objects is used as the initial target time range feature for each group.

[0125] For example, if the set duration is one month, this set duration includes four sub-time periods: the first week, the second week, the third week, and the fourth week (including time beyond the fourth week). There are a total of 5 email logs to be detected for a group of target communication objects. The time vector of the first email log to be detected is [1,0,0,0], the time vector of the second email log to be detected is [0,1,0,0], the time vector of the third email log to be detected is [1,0,0,0], the time vector of the fourth email log to be detected is [0,0,0,1], and the time vector of the fifth email log to be detected is [1,0,0,0]. Then the initial target time range feature corresponding to this group of target communication objects is [1,0,0,0]+[0,1,0,0]+[1,0,0,0]+[0,0,0,1]+[1,0,0,0]=[3,1,0,1].

[0126] Step 403: Based on the number of email logs to be detected corresponding to each group of target communication objects, normalize the corresponding initial target time range features to obtain their respective target time range features.

[0127] For example, if there are 5 email logs to be detected corresponding to a group of target communication objects, and the corresponding initial target time range feature is [1,0,0,0]+[0,1,0,0]+[1,0,0,0]+[0,0,0,1]+[1,0,0,0]=[3,1,0,1], then the target time range feature corresponding to this group of target communication objects is [3,1,0,1] / 5 = [0.6,0.2,0,0.2].

[0128] Step 304, according to each group of target communication objects, and their respective corresponding target time range features, target location information, and target communication features, obtain their respective corresponding target features;

[0129] The specific steps of the above method for obtaining target features are as Figure 5 shown, including:

[0130] Step 501, vectorize each group of target communication objects and their respective corresponding target location information by using a preset vectorization algorithm, and obtain their respective corresponding third features;

[0131] The above vectorization algorithm can be a trained BERT (Bidirectional Encoder Representation from Transformers, a language representation model) or other vectorization algorithms. The training process of the BERT model is prior art and will not be elaborated here.

[0132] When the vectorization algorithm is BERT, the third feature is obtained by the following method:

[0133] Respectively input each group of target communication objects and their respective corresponding target location information into BERT, and obtain the corresponding third features output by BERT.

[0134] Specifically, BERT is a relatively good pre-trained model in NLP (Natural Language Processing). It has a good effect on the vectorization representation of words. For example, the information input into BERT is ['test_sender@163.com', 'test_receiver@163.com', 'Beijing, China'], and the obtained third feature is [[-0.3,0.1,0.33,...,0.1],[0.2,0.1,-0.3,...,0.3],[0.7,-0.1,0.2,...,-0.4]], where the dimension can be specified by oneself, such as 768 dimensions, then a 3-row and 768-column matrix is formed. The 3 rows represent 3 features, namely the target sender object, the target receiver object, and the target location information.

[0135] Step 502: Concatenate the corresponding third feature, the target time range feature, and the target communication feature to obtain the corresponding target features.

[0136] For example, if the target time range feature is [0.6, 0.2, 0, 0.2], and assuming all 5 email logs to be detected have successful communication (i.e., all have a flag of 1.0), then the target communication feature is [1.0, 1.0, 1.0, 1.0, 1.0], and the third feature is [[-0.3, 0.1, 0.33, ..., 0.1], [0.2, 0.1, -0.3, ..., 0.3], [0.7, -0.1, 0.2, ..., -0.4]]. Concatenating the target time range feature, target communication feature, and third feature yields the target feature. The expression is [[-0.3,0.1,0.33,...,0.1,0.0,0.0,0.0,0.0,0.0,0.0],[0.2,0.1,-0.3,...,0.3,0.0,0.0,0.0,0.0,0.0,0.0],[0.7,-0.1,0.2,...,-0.4,0.0,0.0,0.0,0.0,0.0],[0.0,...,0.6,0.2,0,0.2,0.0],[0.0,0.0,...,1.0]], which is a matrix with 5 rows and 773 columns. The 5 rows represent 5 features. The meanings of the 5 features are target recipient, target sender, target location information, target time range feature, and target communication feature. 773 represents the dimension of each feature, where 773 = 768 + 5. 768 is the dimension of the third feature, which needs to be specified by the user, and 5 = 4 + 1. 4 represents the dimension of the target time range feature, and 1 represents the dimension of the target communication feature. Therefore, there are a total of 773 dimensions, that is, each feature is represented by a 773-dimensional vector.

[0137] Step 305: Input the obtained target features into the corresponding communication object anomaly detection model to obtain the actual output of each communication object anomaly model;

[0138] For example, if the target communication objects are sender A and receiver B, then the target features of this group of target communication objects are input into the communication object anomaly detection model corresponding to this group of target communication objects.

[0139] Currently, for employee email campaigns, fixed rules are typically written using the traditional UEBA method. However, this method is relatively rigid and inflexible. When there are many sub-scenarios, multiple sets of rules need to be written to detect the corresponding sub-scenarios, which is not only labor-intensive but also troublesome to maintain.

[0140] To address the aforementioned problems, this disclosure provides a method for establishing a communication object anomaly detection model, such as... Figure 6 As shown, it includes:

[0141] Step 601: Obtain multiple historical email logs within the set time period;

[0142] For example, retrieving multiple historical email logs within a month, where "month" refers to July 2022 or June 2022, etc. However, retrieving one month from multiple email logs to be examined within a month refers to the most recent month, such as August 2022.

[0143] Step 602: Analyze the multiple historical email logs respectively to obtain the corresponding sender and receiver;

[0144] Step 603: For each group of communication objects, determine their respective time range characteristics based on the sub-time period to which the email communication time in each historical email log belongs. Each group of communication objects includes a sender object and a receiver object.

[0145] The corresponding time range characteristics are determined using the following methods:

[0146] For each group of communication objects, the time vector of each historical email log corresponding to each group of communication objects is determined based on the sub-time period to which the email communication time in each historical email log belongs.

[0147] The sum of the time vectors of each historical email log corresponding to each group of communication objects is used as the initial time range feature for each group.

[0148] Based on the number of historical email logs corresponding to each group of communication objects, the corresponding initial time range features are normalized to obtain their respective time range features.

[0149] Step 604: Based on each group of communication objects and their respective time range features, location information, and communication features, obtain the corresponding communication object anomaly detection model. Each piece of location information is obtained by analyzing the address information in the corresponding historical email log, and each communication feature represents the sending status in the corresponding historical email log.

[0150] To improve the accuracy of anomaly detection using communication object anomaly detection models, anomaly detection models for each communication object can be obtained based on multiple historical email logs within a set time period.

[0151] like Figure 7 As shown, based on each group of communication objects and their corresponding time range features, location information, and communication features, a corresponding communication object anomaly detection model is obtained, specifically including:

[0152] Step 701: Vectorize each group of communication objects and their corresponding location information using a preset vectorization algorithm to obtain their respective first features;

[0153] The aforementioned preset density estimation algorithm can be BERT or other vectorization algorithms. When using BERT, the corresponding first feature can be obtained in the following ways:

[0154] Each group of communication objects and its corresponding location information are input into BERT to obtain the corresponding first feature output by BERT.

[0155] Step 702: Concatenate the corresponding first feature, the time range feature, and the communication feature to obtain the corresponding second feature;

[0156] The specific steps are similar to step 502, and will not be repeated here.

[0157] Step 703: Based on the corresponding second feature, each is modeled using a preset density estimation algorithm to obtain the corresponding communication object anomaly detection model.

[0158] The aforementioned preset density estimation algorithm can be KDE (Kernel Density Estimation) or other density estimation algorithms. KDE is a non-parametric machine learning algorithm that models a user's historical behavioral habits based on historical data from email accounts. The model is a probability density function, which is essentially a description of a distribution; for example, the probability density function of a Gaussian distribution is a model of the Gaussian distribution.

[0159] Specifically, based on the second feature corresponding to a group of communication objects, the KDE algorithm is used to model the probability density of the data and draw the corresponding probability density curve. This probability density curve can characterize the behavioral baseline of the number of emails sent and received and the number of communications of the communication objects. This curve is the communication object anomaly detection model corresponding to this group of communication objects. For the second feature corresponding to each group of communication objects, a corresponding communication object anomaly detection model can be established. That is, each group of communication objects corresponds to a probability density curve, which can be represented by a probability density function. Therefore, by inputting a target feature into the corresponding probability density function, the corresponding actual result can be obtained.

[0160] Step 306: Based on the relationship between the distance of each actual result to the corresponding median and the set threshold, determine and display the detection results of the communication objects of the multiple email logs to be detected, wherein each median is determined based on the mean of each function value in the corresponding communication object anomaly model.

[0161] For example, such as Figure 8 As shown, this is a communication object anomaly detection model corresponding to a set of communication objects. This communication object anomaly detection model is a probability density curve. Based on this probability density curve, a median line can be determined according to the mean of the peak values, or according to the mean of each function value. The median line can be a straight line or a curve.

[0162] The above detection results include anomaly categories and abnormal communication objects. Anomaly categories include anomalies in the target sender object, target receiver object, multiple source address information corresponding to the target sender object, multiple destination address information corresponding to the target receiver object, multiple target sender objects corresponding to the same source address, multiple target receiver objects corresponding to the same destination address, and excessive communication by the target communication object. Abnormal objects include the target sender object and / or the target receiver object. After obtaining the detection results, they are output to the analysis engine, which displays them to customers or operations personnel through its front-end interface for further decision-making.

[0163] This disclosure enables the classification of communication object anomalies, allowing operation and maintenance personnel to perform corresponding operation and maintenance work according to the anomaly category, and also allows users to better understand the anomaly scenario, saving them the workload of troubleshooting specific anomaly scenarios.

[0164] Based on the relationship between the distances from the actual results to the corresponding midline and the set thresholds, the detection results for the communication objects of the multiple email logs to be detected are determined and displayed, specifically including:

[0165] For each actual result, perform the following operations:

[0166] For a given actual result, if the distance from the actual result to the corresponding centerline does not exceed the threshold, then the corresponding target communication object is determined to be normal.

[0167] If the distance from an actual result to the corresponding midline exceeds the threshold, then the corresponding target communication object is determined to have abnormal behavior.

[0168] The above thresholds can be set according to the actual situation, and there are no restrictions here.

[0169] When abnormal behavior is identified in the target communication object, it is necessary to further determine the anomaly category. This can be done by first obtaining the source address information of the target sender and the destination address information of the target receiver based on the multiple email logs to be detected. Then, the anomaly category of the target communication object can be determined according to a whitelist strategy, specifically including the following three cases:

[0170] In the first case, if the target sender and its corresponding source address information are not in the pre-built whitelist, but the target receiver and its corresponding destination address information are in the whitelist, then the target sender is determined to be abnormal.

[0171] For example, if a target sender sends multiple email logs to a target recipient, the source address information corresponding to that target sender can be one or more. When there are multiple source addresses corresponding to that target sender, if any one of them is not in the whitelist, the target sender is determined to be abnormal.

[0172] The whitelist can be constructed in the following ways:

[0173] Each historical email log within the set time period is analyzed to obtain its corresponding source address information, destination address information, sender and recipient information;

[0174] If the sender and receiver are determined to be normal communication objects, then the sender is associated with the source address information and stored in the whitelist, and the receiver is associated with the destination address information and stored in the whitelist.

[0175] The sender and receiver in a historical email log can be flagged as normal communication objects to determine whether they are normal communication objects.

[0176] In the second scenario, if the target sender and its corresponding source address information are in the whitelist, and the target receiver and its corresponding destination address information are not in the whitelist, then the target receiver is determined to be abnormal.

[0177] For example, if the target sender sends multiple email logs to the target recipient, the destination address information corresponding to the target recipient can be one or more. When there are multiple destination addresses corresponding to the target recipient, if any one of the destination addresses is not in the whitelist, the target recipient is determined to be abnormal.

[0178] In the third scenario, if the target sender and its corresponding source address information are not in the whitelist, and the target receiver and its corresponding destination address information are not in the whitelist, then the target communication object is determined to be abnormal.

[0179] Based on the multiple email logs to be detected, if the target sender and its corresponding source address information are in the whitelist, and the target recipient and its corresponding destination address information are in the whitelist, then a first number of source address information corresponding to the target sender and a second number of destination address information corresponding to the target recipient are determined, and the abnormal category of the target communication object is further determined, specifically including:

[0180] If the first quantity exceeds the set first threshold, it is determined that the target sender object corresponds to multiple abnormal source address information.

[0181] The aforementioned first threshold can be set according to actual circumstances. The abnormality of multiple source address information corresponding to the target sender indicates an anomaly in the same target sender logging into multiple IP (Internet Protocol) hosts.

[0182] If the second quantity exceeds the set second threshold, it is determined that the target recipient object corresponds to multiple abnormal destination address information.

[0183] The second threshold mentioned above can be set according to actual conditions. The first threshold can be equal to or different from the second threshold. The abnormality of multiple destination address information corresponding to the target recipient indicates that the same target recipient is abnormally logging into multiple IP hosts.

[0184] If the first quantity exceeds the first threshold and the second quantity exceeds the second threshold, then it is determined that the multiple source address information corresponding to the target sender object and the multiple destination address information corresponding to the target receiver object are abnormal.

[0185] If, based on the multiple email logs to be detected, multiple target senders and their corresponding source address information are in the whitelist, and multiple target recipients and their corresponding destination address information are in the whitelist, then a third number of target senders corresponding to the same source address information and a fourth number of target recipients corresponding to the same destination address information are determined, and the abnormal category of the corresponding target communication object is further determined:

[0186] If the third quantity exceeds the set third threshold, it is determined that multiple target sender objects corresponding to the same source address information are abnormal.

[0187] The aforementioned third threshold can be set according to actual circumstances. The anomaly of multiple target senders corresponding to the same source address indicates an anomaly of multiple target senders logging into the same IP host. For example, if a user has three email accounts, and email accounts A, B, and C log into the same IP host A, with a third threshold of 2, then this constitutes an anomaly of multiple target senders corresponding to the same source address.

[0188] If the fourth quantity exceeds the set fourth threshold, it is determined that multiple target recipients corresponding to the same destination address information are abnormal.

[0189] The fourth threshold mentioned above can be set according to actual conditions. The third threshold and the fourth threshold can be equal or unequal. The above-mentioned anomaly of multiple target recipients corresponding to the same destination address indicates that multiple target recipients are abnormally logging into the same IP host.

[0190] If the third quantity exceeds the third threshold and the fourth quantity exceeds the fourth threshold, then it is determined that multiple target sender objects corresponding to the same source address information and multiple target receiver objects corresponding to the same destination address information are abnormal.

[0191] If the above methods still cannot determine the anomaly category of the target communication object, then the following judgment is made:

[0192] If the number of email logs to be detected corresponding to the target communication object exceeds the set fifth threshold, then the target communication object is determined to be in an excessive communication abnormality.

[0193] The fifth threshold mentioned above can be set according to the actual situation. For example, if the target communication object communicates 200 times in a month, that is, the number of email logs to be detected corresponding to the target communication object is 200, and the fifth threshold is 150, then it is determined that the target communication object has abnormally exceeded the communication limit.

[0194] Currently, employee email activity typically uses traditional UEBA methods to write fixed rules. Taking a month as an example, a baseline is generated based on all historical email logs from the previous month. The logic for baseline formation is to statistically generate baselines for each group of senders and recipients based on the sender and recipient information in each historical email log. However, when the recipient is a new user, this is likely an anomaly, but the statistical method cannot analyze this anomaly. The reason it cannot be detected is because the rules are fixed and there is no corresponding baseline. It can only detect whether the communication object sends emails in an excessive manner, but it cannot check for new users. Checking for new users requires writing a new set of rules, i.e., establishing a corresponding baseline, so it lacks flexibility.

[0195] To address the aforementioned issues, this disclosure employs the following method for detecting anomalies in communication objects:

[0196] For any set of target communication objects, if no corresponding communication object anomaly detection model exists for the target communication object, the corresponding actual result is determined in the following manner:

[0197] The target features corresponding to the target communication object are input into at least one communication object anomaly detection model corresponding to the target sender object in the target communication object, and the actual results output by each of the at least one communication object anomaly model are obtained.

[0198] After obtaining the actual results, for the target communication object, the detection results of the communication objects of the multiple email logs to be detected are determined and displayed based on the relationship between the distance of each actual result to the corresponding midline and a set threshold, including:

[0199] If the distance from at least one of the actual results to the corresponding centerline does not exceed the threshold, then the target recipient is determined to be normal.

[0200] If the distance from at least one of the actual results to the corresponding midline exceeds the threshold, then the target recipient is determined to be abnormal.

[0201] For example, there are communication object anomaly detection models 1 for sending object A and receiving object B, communication object anomaly detection models 2 for sending object A and receiving object C, communication object anomaly detection models 3 for sending object A and receiving object D, and communication object anomaly detection models 4 for sending object E and receiving object B.

[0202] If a set of target communication objects is sender A and receiver F, then the target features of this set of target communication objects are input into the communication object anomaly detection model 1, communication object anomaly detection model 2 and communication object anomaly detection model 3 corresponding to sender A, and the actual result 1 output by communication object anomaly model 1, the actual result 2 output by communication object anomaly model 2 and the actual result 3 output by communication object anomaly model 3 are obtained.

[0203] Based on the obtained actual results 1, 2, and 3, the distances 1, 2, and 3 from actual result 1 to the corresponding centerline are compared with the threshold values. If none of the distances 1, 2, and 3 exceed the threshold, the recipient F is considered normal; if all three distances exceed the threshold, the recipient F is considered abnormal; if some of the distances exceed the threshold while others do not, manual judgment is required to determine whether the recipient F is abnormal.

[0204] If a set of target communication objects consists of sender object E and receiver object G, then the target features of this set of target communication objects are input into the communication object anomaly detection model 4 corresponding to sender object E, and the actual result 4 output by the communication object anomaly model 4 is obtained.

[0205] Based on the obtained actual result 4, the distance 4 from the actual result 4 to the corresponding centerline is compared with the threshold. If the distance 4 does not exceed the set threshold, the recipient G is normal; if the distance 4 exceeds the set threshold, the recipient G is abnormal.

[0206] In some embodiments, based on the same inventive concept, the present disclosure also provides a communication object anomaly detection device. Since this device is the same as the device in the method of the present disclosure, and the principle of the device in solving the problem is similar to that of the method, the implementation of the device can refer to the implementation of the method, and the repeated parts will not be described again.

[0207] like Figure 9 As shown, the above-mentioned device includes the following modules:

[0208] The acquisition module 901 is used to acquire multiple email logs to be detected within a set time period, wherein the set time period includes multiple sub-time periods;

[0209] Analysis module 902 is used to analyze the multiple email logs to be detected respectively, and obtain the target sender and target receiver for each.

[0210] The determination module 903 is used to determine the target time range characteristics of each group of target communication objects based on the sub-time period to which the email communication time in each corresponding email log to be detected belongs; each group of target communication objects includes a target sender object and a target receiver object.

[0211] The detection module 904 is used to perform anomaly detection on the target communication objects of the multiple email logs to be detected based on the target communication objects of each group, and their respective target time range characteristics, target location information and target communication characteristics; wherein, each target location information is obtained by analyzing the address information in the corresponding email log to be detected, and each target communication characteristic represents the sending status in the corresponding email log to be detected.

[0212] As an optional implementation, the detection module 904 is used for:

[0213] Based on each group of target communication objects, and their respective target time range characteristics, target location information, and target communication characteristics, the corresponding target characteristics are obtained.

[0214] Each target feature obtained is input into the corresponding communication object anomaly detection model to obtain the actual output of each communication object anomaly model.

[0215] Based on the relationship between the distance from each actual result to the corresponding median and the set threshold, the detection results of the communication objects of the multiple email logs to be detected are determined and displayed. Each median is determined based on the mean of each function value in the corresponding communication object anomaly model.

[0216] As an optional implementation, the detection module 904 is used for:

[0217] Retrieve multiple historical email logs within the set time period;

[0218] Analyze each of the multiple historical email logs to obtain the corresponding sender and receiver;

[0219] For each group of communication objects, the corresponding time range characteristics are determined based on the sub-time period to which the email communication time in each historical email log belongs; each group of communication objects includes a sender object and a receiver object.

[0220] Based on each group of communication objects and their respective time range features, location information, and communication features, an anomaly detection model for each communication object is obtained; wherein, each piece of location information is obtained by analyzing the address information in the corresponding historical email log, and each piece of communication feature represents the sending status in the corresponding historical email log.

[0221] As an optional implementation, the detection module 904 is used for:

[0222] For each group of communication objects, the time vector of each historical email log corresponding to each group of communication objects is determined based on the sub-time period to which the email communication time in each historical email log belongs.

[0223] The sum of the time vectors of each historical email log corresponding to each group of communication objects is used as the initial time range feature for each group.

[0224] Based on the number of historical email logs corresponding to each group of communication objects, the corresponding initial time range features are normalized to obtain their respective time range features.

[0225] As an optional implementation, the detection module 904 is used for:

[0226] The communication objects and their corresponding location information are vectorized using a preset vectorization algorithm to obtain their respective first features.

[0227] The first feature, the time range feature, and the communication feature corresponding to each element are concatenated to obtain the second feature corresponding to each element.

[0228] Based on their respective second features, pre-defined density estimation algorithms are used to model them, resulting in corresponding communication object anomaly detection models.

[0229] As an optional implementation, the detection module 904 is used for:

[0230] Each group of communication objects and its corresponding location information are input into BERT to obtain the corresponding first feature output by BERT.

[0231] As an optional implementation, the determining module 903 is used for:

[0232] For each group of target communication objects, the time vector of each email log to be detected is determined based on the sub-time period to which the email communication time in each corresponding email log to be detected belongs;

[0233] The sum of the time vectors of each email log to be detected corresponding to each group of target communication objects is used as the initial target time range feature for each group.

[0234] Based on the number of email logs to be detected corresponding to each group of target communication objects, the corresponding initial target time range features are normalized to obtain their respective target time range features.

[0235] As an optional implementation, the detection module 904 is used for:

[0236] The target communication objects and their corresponding target location information are vectorized using a preset vectorization algorithm to obtain their respective third features;

[0237] The corresponding third feature, the target time range feature, and the target communication feature are concatenated to obtain their respective target features.

[0238] As an optional implementation, the detection module 904 is used for:

[0239] Each group of target communication objects and their corresponding target location information are input into the BERT to obtain the corresponding third feature output by the BERT.

[0240] As an optional implementation, the detection module 904 is further configured to:

[0241] For any set of target communication objects, if no corresponding communication object anomaly detection model exists for the target communication object, the corresponding actual result is determined in the following manner:

[0242] The target features corresponding to the target communication object are input into at least one communication object anomaly detection model corresponding to the target sender object in the target communication object, and the actual results output by each of the at least one communication object anomaly model are obtained.

[0243] As an optional implementation, for the target communication object, the detection module 904 is used to:

[0244] If the distance from at least one of the actual results to the corresponding centerline does not exceed the threshold, then the target recipient is determined to be normal.

[0245] If the distance from at least one of the actual results to the corresponding midline exceeds the threshold, then the target recipient is determined to be abnormal.

[0246] As an optional implementation, the detection module 904 is used for:

[0247] For each actual result, perform the following operations:

[0248] For a given actual result, if the distance from the actual result to the corresponding centerline does not exceed the threshold, then the corresponding target communication object is determined to be normal.

[0249] If the distance from an actual result to the corresponding midline exceeds the threshold, then the corresponding target communication object is determined to have abnormal behavior.

[0250] As an optional implementation, the detection module 904 is used for:

[0251] Based on the multiple email logs to be detected, obtain the source address information corresponding to the target sender and the destination address information corresponding to the target receiver;

[0252] If the target sender and its corresponding source address information are not in the pre-built whitelist, but the target receiver and its corresponding destination address information are in the whitelist, then the target sender is determined to be abnormal.

[0253] If the target sender and its corresponding source address are in the whitelist, and the target receiver and its corresponding destination address are not in the whitelist, then the target receiver is determined to be abnormal.

[0254] If the target sender and its corresponding source address are not in the whitelist, and the target receiver and its corresponding destination address are not in the whitelist, then the target communication object is determined to be abnormal.

[0255] As an optional implementation, the detection module 904 is used to construct the whitelist in the following manner:

[0256] Each historical email log within the set time period is analyzed to obtain its corresponding source address information, destination address information, sender and recipient information;

[0257] If the sender and receiver are determined to be normal communication objects, then the sender is associated with the source address information and stored in the whitelist, and the receiver is associated with the destination address information and stored in the whitelist.

[0258] As an optional implementation, the detection module 904 is used for:

[0259] Based on the multiple email logs to be detected, a first number of source address information corresponding to the target sender and a second number of destination address information corresponding to the target receiver are determined.

[0260] If the first quantity exceeds the set first threshold, it is determined that the target sender object corresponds to multiple abnormal source address information.

[0261] If the second quantity exceeds the set second threshold, it is determined that the target recipient object corresponds to multiple abnormal destination address information.

[0262] If the first quantity exceeds the first threshold and the second quantity exceeds the second threshold, then it is determined that the multiple source address information corresponding to the target sender object and the multiple destination address information corresponding to the target receiver object are abnormal.

[0263] As an optional implementation, the detection module 904 is used for:

[0264] Based on the multiple email logs to be detected, a third number of target sender objects corresponding to the same source address information and a fourth number of target receiver objects corresponding to the same target address information are determined.

[0265] If the third quantity exceeds the set third threshold, it is determined that multiple target sender objects corresponding to the same source address information are abnormal.

[0266] If the fourth quantity exceeds the set fourth threshold, it is determined that multiple target recipients corresponding to the same destination address information are abnormal.

[0267] If the third quantity exceeds the third threshold and the fourth quantity exceeds the fourth threshold, then it is determined that multiple target sender objects corresponding to the same source address information and multiple target receiver objects corresponding to the same destination address information are abnormal.

[0268] As an optional implementation, the detection module 904 is used for:

[0269] If the number of email logs to be detected corresponding to the target communication object exceeds the set fifth threshold, then the target communication object is determined to be in an excessive communication abnormality.

[0270] In some embodiments, based on the same inventive concept, this disclosure also provides a communication object anomaly detection device, which can implement the communication object anomaly detection function described above. Please refer to... Figure 10 The device includes a processor 101 and a memory 102, wherein the memory 102 is used to store program instructions;

[0271] The processor 101 calls the program instructions stored in the memory and executes the program instructions to implement the above-mentioned communication object anomaly detection method. Since the principle of the communication object anomaly detection device in solving the problem is similar to that of the communication object anomaly detection method, the implementation of the communication object anomaly detection device can be referred to the implementation of the method, and repeated details will not be elaborated further.

[0272] In some possible implementations, various aspects of this disclosure can also be implemented in the form of a program product, such as... Figure 11 As shown, the computer program product 110 includes computer program code that, when run on a computer, causes the computer to execute any of the communication object anomaly detection methods discussed above. Since the principle by which the above computer program product solves the problem is similar to that of the communication object anomaly detection method, the implementation of the above computer program product can be found in the implementation of the method; repeated details will not be elaborated further.

[0273] Those skilled in the art will understand that embodiments of this disclosure can be provided as methods, systems, or computer program products. Therefore, this disclosure can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this disclosure can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.

[0274] This disclosure is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a machine for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 Devices that specify the functions in one or more boxes.

[0275] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including an instruction device, which is implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0276] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0277] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the following claims.

[0278] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.

Claims

1. A method for detecting anomalies in communication objects, characterized in that, The method includes: Obtain multiple email logs to be inspected within a set time period, wherein the set time period includes multiple sub-time periods; The multiple email logs to be detected are analyzed separately to obtain the target sender and target receiver for each. For each group of target communication objects, the time vector of each email log to be detected is determined based on the sub-time period to which the email communication time in each corresponding email log to be detected belongs. The sum of the time vectors of each email log to be detected for each group of target communication objects is used as the initial target time range feature for each group. The initial target time range feature is normalized according to the number of email logs to be detected for each group of target communication objects to obtain the corresponding target time range feature. Each group of target communication objects includes a target sender object and a target receiver object. Based on each group of target communication objects, and their respective target time range features, target location information, and target communication features, corresponding target features are obtained. Each obtained target feature is then input into a corresponding communication object anomaly detection model to obtain the actual output results of each communication object anomaly model. Based on the relationship between the distance from each actual result to the corresponding median and a set threshold, the detection results for the communication objects in the multiple email logs to be detected are determined and displayed. Each target location information is obtained by analyzing the address information in the corresponding email log to be detected; each target communication feature represents the sending status in the corresponding email log to be detected; and each median is determined based on the mean of each function value in the corresponding communication object anomaly model. The communication object anomaly detection model is obtained through the following method: Retrieve multiple historical email logs within the set time period; Analyze each of the multiple historical email logs to obtain the corresponding sender and receiver; For each group of communication objects, the corresponding time range characteristics are determined based on the sub-time period to which the email communication time in each historical email log belongs; each group of communication objects includes a sender object and a receiver object. Each group of communication objects and their corresponding location information are vectorized using a preset vectorization algorithm to obtain their respective first features; the first features, time range features, and communication features are concatenated to obtain their respective second features; based on their respective second features, non-parametric density estimation algorithms are used to model them to obtain their respective communication object anomaly detection models; wherein, each piece of location information is obtained by analyzing the address information in the corresponding historical email log, and each communication feature represents the sending status in the corresponding historical email log.

2. The method according to claim 1, characterized in that, For each group of communication objects, based on the sub-time period to which the email communication time in each corresponding historical email log belongs, the corresponding time range characteristics are determined, including: For each group of communication objects, the time vector of each historical email log corresponding to each group of communication objects is determined based on the sub-time period to which the email communication time in each historical email log belongs. The sum of the time vectors of each historical email log corresponding to each group of communication objects is used as the initial time range feature for each group. Based on the number of historical email logs corresponding to each group of communication objects, the corresponding initial time range features are normalized to obtain their respective time range features.

3. The method according to claim 1, characterized in that, The step of characterizing each group of communication objects and their corresponding location information using a preset featureization algorithm to obtain their respective first features includes: Each group of communication objects and its corresponding location information are input into the language representation model BERT to obtain the corresponding first feature output by BERT.

4. The method according to claim 1, characterized in that, The step of obtaining the corresponding target features based on each group of target communication objects and their respective target time range features, target location information, and target communication features includes: The target communication objects and their corresponding target location information are vectorized using a preset vectorization algorithm to obtain their respective third features; The corresponding third feature, the target time range feature, and the target communication feature are concatenated to obtain their respective target features.

5. The method according to claim 4, characterized in that, The step of vectorizing each group of target communication objects and their corresponding target location information using a preset vectorization algorithm to obtain their respective third features includes: Each group of target communication objects and their corresponding target location information are input into BERT to obtain the corresponding third feature output by BERT.

6. The method according to claim 1, characterized in that, The method further includes: For any set of target communication objects, if no corresponding communication object anomaly detection model exists for the target communication objects, the corresponding actual result is determined in the following way: The target features corresponding to the target communication object are input into at least one communication object anomaly detection model corresponding to the target sender object in the target communication object, and the actual results output by each of the at least one communication object anomaly model are obtained.

7. The method according to claim 6, characterized in that, For the target communication object, the process of determining and displaying the detection results of the communication object for the multiple email logs to be detected based on the relationship between the distance of each actual result to the corresponding median and a set threshold includes: If the distance from at least one of the actual results to the corresponding centerline does not exceed the threshold, then the target recipient is determined to be normal. If the distance from at least one of the actual results to the corresponding midline exceeds the threshold, then the target recipient is determined to be abnormal.

8. The method according to claim 1, characterized in that, The process of determining and displaying the detection results of the communication objects of the multiple email logs to be detected based on the relationship between the distances from each actual result to the corresponding midline and a set threshold includes: For each actual result, perform the following operations: For a given actual result, if the distance from the actual result to the corresponding centerline does not exceed the threshold, then the corresponding target communication object is determined to be normal. If the distance from an actual result to the corresponding midline exceeds the threshold, then the corresponding target communication object is determined to have abnormal behavior.

9. The method according to claim 8, characterized in that, The determination that the corresponding target communication object exhibits abnormal behavior includes: Based on the multiple email logs to be detected, obtain the source address information corresponding to the target sender and the destination address information corresponding to the target receiver; If the target sender and its corresponding source address information are not in the pre-built whitelist, but the target receiver and its corresponding destination address information are in the whitelist, then the target sender is determined to be abnormal. If the target sender and its corresponding source address are in the whitelist, and the target receiver and its corresponding destination address are not in the whitelist, then the target receiver is determined to be abnormal. If the target sender and its corresponding source address are not in the whitelist, and the target receiver and its corresponding destination address are not in the whitelist, then the target communication object is determined to be abnormal.

10. The method according to claim 9, characterized in that, The whitelist is constructed using the following method: Each historical email log within the set time period is analyzed to obtain its corresponding source address information, destination address information, sender and recipient information; If the sender and receiver are determined to be normal communication objects, then the sender is associated with the source address information and stored in the whitelist, and the receiver is associated with the destination address information and stored in the whitelist.

11. The method according to claim 8, characterized in that, The determination that the corresponding target communication object exhibits abnormal behavior includes: Based on the multiple email logs to be detected, a first number of source address information corresponding to the target sender and a second number of destination address information corresponding to the target receiver are determined. If the first quantity exceeds the set first threshold, it is determined that the target sender object corresponds to multiple abnormal source address information. If the second quantity exceeds the set second threshold, it is determined that the target recipient object corresponds to multiple abnormal destination address information. If the first quantity exceeds the first threshold and the second quantity exceeds the second threshold, then it is determined that the multiple source address information corresponding to the target sender object and the multiple destination address information corresponding to the target receiver object are abnormal.

12. The method according to claim 8, characterized in that, The determination that the corresponding target communication object exhibits abnormal behavior includes: Based on the multiple email logs to be detected, a third number of target sender objects corresponding to the same source address information and a fourth number of target receiver objects corresponding to the same target address information are determined. If the third quantity exceeds the set third threshold, it is determined that multiple target sender objects corresponding to the same source address information are abnormal. If the fourth quantity exceeds the set fourth threshold, it is determined that multiple target recipients corresponding to the same destination address information are abnormal. If the third quantity exceeds the third threshold and the fourth quantity exceeds the fourth threshold, then it is determined that multiple target sender objects corresponding to the same source address information and multiple target receiver objects corresponding to the same destination address information are abnormal.

13. The method according to claim 8, characterized in that, The determination that the corresponding target communication object exhibits abnormal behavior includes: If the number of email logs to be detected corresponding to the target communication object exceeds the set fifth threshold, then the target communication object is determined to be in an excessive communication abnormality.

14. A communication object anomaly detection device, characterized in that, The device includes: The acquisition module is used to acquire multiple email logs to be detected within a set time period, wherein the set time period includes multiple sub-time periods; The analysis module is used to analyze the multiple email logs to be detected, and obtain the target sender and target receiver for each. The determination module is used to determine the time vector of each email log to be detected corresponding to each group of target communication objects based on the sub-time period to which the email communication time in each corresponding email log to be detected belongs; the sum of the time vectors of each email log to be detected corresponding to each group of target communication objects is used as the initial target time range feature for each group; the initial target time range feature is normalized according to the number of email logs to be detected corresponding to each group of target communication objects to obtain the target time range feature for each group; each group of target communication objects includes a target sender object and a target receiver object. The detection module is used to obtain corresponding target features based on each group of target communication objects and their corresponding target time range features, target location information, and target communication features; input each obtained target feature into the corresponding communication object anomaly detection model to obtain the actual results output by each communication object anomaly model; and determine and display the detection results of the communication objects of the multiple email logs to be detected based on the relationship between the distance of each actual result to the corresponding median and a set threshold. Each target location information is obtained by analyzing the address information in the corresponding email log to be detected, each target communication feature represents the sending status in the corresponding email log to be detected, and each median is determined based on the mean of each function value in the corresponding communication object anomaly model. The detection module is used for: Retrieve multiple historical email logs within the set time period; Analyze each of the multiple historical email logs to obtain the corresponding sender and receiver; For each group of communication objects, the corresponding time range characteristics are determined based on the sub-time period to which the email communication time in each historical email log belongs; each group of communication objects includes a sender object and a receiver object. Each group of communication objects and their corresponding location information are vectorized using a preset vectorization algorithm to obtain their respective first features; the first features, time range features, and communication features are concatenated to obtain their respective second features; based on their respective second features, non-parametric density estimation algorithms are used to model them to obtain their respective communication object anomaly detection models; wherein, each piece of location information is obtained by analyzing the address information in the corresponding historical email log, and each communication feature represents the sending status in the corresponding historical email log.

15. An electronic device, characterized in that, include: processor; A memory for storing processor-executable instructions; wherein the processor implements the steps of the method according to any one of claims 1 to 13 by executing the executable instructions.

16. A computer-readable and writable storage medium storing computer instructions thereon, characterized in that, When executed by a processor, this instruction implements the steps of the method according to any one of claims 1 to 13.