Cyber threat information processing device, cyber threat information processing method, and storage medium for storing computer-executable program that processes cyber threat information

The cyber threat information processing device and method address inefficiencies in threat detection and analysis by using a multi-agent natural language model to automate threat identification and data collection, enhancing cybersecurity response and analysis efficiency across IT, OT, and IoT devices.

WO2026141754A1PCT designated stage Publication Date: 2026-07-02SANDS LAB INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SANDS LAB INC
Filing Date
2024-12-30
Publication Date
2026-07-02

Smart Images

  • Figure KR2024021392_02072026_PF_FP_ABST
    Figure KR2024021392_02072026_PF_FP_ABST
Patent Text Reader

Abstract

One embodiment according to the present disclosure provides a cyber threat information processing method comprising the steps of: acquiring packet data from a client; performing packet analysis on the packet data so as to generate network threat analysis information; generating maliciousness indicator threat analysis information for the packet data on the basis of the network maliciousness analysis information; and providing threat analysis information on the basis of the network threat analysis information and the maliciousness indicator threat analysis information.
Need to check novelty before this filing date? Find Prior Art

Description

Cyber ​​threat information processing device, cyber threat information processing method, and storage medium storing a computer-executable program for processing cyber threat information

[0001] The disclosed embodiments relate to a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing a program for processing cyber threat information.

[0002] The damage caused by increasingly sophisticated cyber security threats, centered on new or variant malware, is growing. To mitigate such damage and enable early response, we are simultaneously advancing our response technologies through multi-dimensional pattern construction and various complex analyses.

[0003] Until now, companies have focused on perimeter-based security to detect and block traffic between the internal and external environments using technologies such as Virtual Private Networks (VPNs), firewalls, and Intrusion Detection Systems (IDS) / Intrusion Prevention Systems (IPS). However, they are facing difficulties with security measures due to the complexity of technology, the diversity of attacks, and the increasing number of attack points.

[0004] To respond to cyber threats through network-based traffic, network layer or transport layer-based traffic analysis had the problem of being unable to comprehensively and visually analyze threat information.

[0005] Therefore, there was a problem in that the detection of cyber threats through network-based traffic targeted only information technology (IT) assets and could not detect or identify threats to operation technology (OT) assets or Internet of Things (IoT) devices.

[0006] There was a problem in that the analysis of cyber threats through network-based traffic was fragmentary and mostly only possible after a breach, making it difficult to analyze large amounts of network traffic in real time and respond to cyber threats.

[0007] Furthermore, malicious activities based on cyber threat information were analyzed using various inconsistent techniques or information that could not be accurately described by anyone other than an expert, making it difficult to easily understand their mechanisms and the basis for analysis.

[0008] Meanwhile, while such analysis requires high-quality AI training datasets to respond to cyber threats, these datasets have been difficult to find. Even companies seeking to develop technologies to counter cyber threats using AI face the problem of struggling to locate appropriate data or malware samples.

[0009] Against this backdrop, while the demand for high-quality AI training datasets has recently surged, there have been technical and practical difficulties in acquiring the desired malware analysis data.

[0010] When the results detected by the cyber threat information processing system included false positives, the process of resolving them had to be done manually by humans. For example, when a complaint email regarding the processing results of the cyber threat information processing system is received, the administrator of the cyber threat information processing system or the mail system checks and classifies the email, extracts necessary information, and performs reclassification.

[0011] However, this manual processing method was repetitive and inefficient, consuming a significant amount of time and effort, and also had the problem of increasing the fatigue of the processor, which could lead to the possibility of errors.

[0012] In particular, there was a problem where data omissions or incorrect processing that could occur during the handling of complaint emails could have a negative impact on customer satisfaction.

[0013] Cyber ​​threat information comes in various forms and types, and there are many ways to represent it. For example, there are various cyber threats categorized by attack groups, attack techniques, and threat classifications. Therefore, without insight into this, even if such information is extracted, it is difficult to interpret accurately, and there was a problem in that analysis required a significant amount of time.

[0014] Furthermore, the importance of threat analysis and information gathering utilizing Open Source Intelligence (OSINT) in cybersecurity has recently been increasing. However, when extracting compromise indicators and related information from data collected from open sources, conventional extraction methods have a problem in that they perform only pattern matching using regular expressions, failing to reflect contextual meaning.

[0015] Furthermore, as cyber security threats increase, the importance of detailed analysis of cyber threats is growing. However, in the past, companies without specialized analysts found it difficult to perform detailed threat analysis even with traffic information available, and there was a problem in that they had to conduct the analysis manually if they did not use a cyber threat intelligence system.

[0016] The purpose of the embodiments disclosed below is to solve the above problems by providing a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing a program for processing cyber threat information, which can comprehensively and visually analyze cyber threat information through network-based traffic.

[0017] Another objective of the embodiment is to provide a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing a program for processing cyber threat information, which can detect or identify cyber threats to various assets such as IT assets, as well as operational technology (OT) assets or IoT devices.

[0018] Another objective of the embodiment is to provide a cyber threat information processing device capable of analyzing network traffic in real time and responding to cyber threats, a cyber threat information processing method, and a storage medium storing a program for processing cyber threat information.

[0019] Another objective of the embodiment is to provide a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing a program for processing cyber threat information, so that even a non-expert user can easily understand the mechanism and basis of analysis of detected or analyzed cyber threat information.

[0020] In the following, a cyber threat information processing device, a cyber threat information processing method, and a cyber threat information processing program are provided, which can easily obtain malware analysis data to respond to cyber threats and acquire specifically required or technically necessary data sets.

[0021] Another objective of the examples disclosed below is to provide a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing the program, which can efficiently and quickly process false positive results of a cyber threat information processing system while reducing the occurrence of errors.

[0022] Another objective of the examples disclosed below is to provide a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing the program, which allow a user to have intuitive insights into cyber threat information processed data.

[0023] Another objective of the examples disclosed below is to provide a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing the program, which automatically collect intrusion indicators and related information from various open sources and utilize a multi-agent natural language model structure to extract accurate data reflecting context.

[0024] Another objective of the examples disclosed below is to provide a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing the program, which can perform a detailed analysis of packet data with potential cyber threat and provide the analysis results to a user.

[0025] A disclosed embodiment provides a method for processing cyber threat information, comprising: a step of acquiring packet data from a client; a step of generating network threat analysis information by performing packet analysis on the packet data; a step of generating malicious indicator threat analysis information for the packet data based on the network malicious analysis information; and a step of providing threat analysis information based on the network threat analysis information and the malicious indicator threat analysis information.

[0026] The step of analyzing the network threat analysis information comprises: a step of performing packet analysis on the packet data to generate at least one of metadata and binary data for the packet data; and a step of generating network threat analysis information for the packet data based on at least one of the metadata and binary data according to a pre-stored detection rule.

[0027] The above network threat analysis information is characterized by including at least one of malware-related information, sensitive information-related information, and network-related information regarding the above packet data.

[0028] The step of analyzing the above malicious indicator threat analysis information is characterized by including the step of generating malicious indicator threat analysis information for the packet data based on infringement indicator information included in the above network threat analysis information.

[0029] The step provided above includes the step of inputting the network threat analysis information and malicious indicator threat analysis information into a natural language model to generate threat analysis information in the form of natural language; and

[0030] It is characterized by including the step of providing the threat analysis information to the above client.

[0031] One disclosed embodiment provides a cyber threat information processing device comprising: a storage device for storing data; an in-memory for storing a library engine set related to software; and a processor for executing said software; wherein the processor acquires packet data from a client, performs packet analysis on said packet data to generate network threat analysis information, generates malicious indicator threat analysis information on said packet data based on said network malicious analysis information, and provides threat analysis information based on said network threat analysis information and malicious indicator threat analysis information.

[0032] The processor is characterized by performing packet analysis on the packet data to generate at least one of metadata and binary data for the packet data, and generating network threat analysis information for the packet data based on at least one of the metadata and binary data according to a pre-stored detection rule.

[0033] The above network threat analysis information is characterized by including at least one of malware-related information, sensitive information-related information, and network-related information regarding the above packet data.

[0034] The processor is characterized by generating malicious indicator threat analysis information for the packet data based on infringement indicator information included in the network threat analysis information.

[0035] The above processor is characterized by inputting the network threat analysis information and malicious indicator threat analysis information into a natural language model to generate threat analysis information in the form of natural language, and providing the threat analysis information to the client.

[0036] One disclosed embodiment provides a storage medium for storing computer-executable software that performs the steps of acquiring packet data from a client, performing packet analysis on said packet data to generate network threat analysis information, generating malicious indicator threat analysis information on said packet data based on said network malicious analysis information, and providing threat analysis information based on said network threat analysis information and malicious indicator threat analysis information.

[0037] According to the disclosed embodiment, comprehensive and visible analysis of cyber threat information through network-based traffic is possible.

[0038] According to the disclosed embodiments, cyber threats to various assets, such as IT assets as well as operational technology (OT) assets or IoT devices, can be detected or identified.

[0039] According to the disclosed embodiment, network traffic can be analyzed in real time and cyber threats can be responded to.

[0040] According to the disclosed embodiments, even if the user is not an expert, they can easily understand the mechanism and basis of analysis of detected or analyzed cyber threat information.

[0041] According to the disclosed embodiment, through the execution of ASM, vulnerabilities of assets can be identified to apply security controls and reinforce cybersecurity strategies and policies for assets.

[0042] According to the disclosed example, malware analysis data can be easily obtained to respond to cyber threats, and specifically required or technically necessary data sets can be acquired.

[0043] According to the disclosed example, the occurrence of errors can be reduced while efficiently and quickly processing false positive results of a cyber threat information processing system.

[0044] According to the disclosed example, users can gain intuitive insights into cyber threat information processed data and easily obtain natural language-based insights through interpretation information inherent in a vast amount of cyber threat information.

[0045] According to the disclosed example, data can be efficiently collected by automatically gathering intrusion indicators and related information from various open sources, and extraction accuracy can be improved by utilizing a multi-agent natural language model structure to extract accurate data that reflects context.

[0046] According to the disclosed example, a detailed analysis of packet data with potential cyber threat potential is performed, and the analysis results are provided to the user to identify various types of threats and effectively explain the identified threats and countermeasures.

[0047] FIG. 1 is a drawing disclosing an embodiment of a cyber threat information processing method according to an embodiment.

[0048] FIG. 2 is a drawing disclosing another embodiment of a cyber threat information processing method according to an embodiment.

[0049] FIG. 3 is a drawing disclosing embodiments of a cyber threat information processing device according to an embodiment.

[0050] FIG. 4 is a drawing illustrating a first CTI device as an embodiment of a cyber threat information processing device according to an embodiment.

[0051] FIG. 5 is a drawing disclosing an example in which a first CTI device and a second CTI device are interconnected as an embodiment of cyber threat information processing devices according to an embodiment.

[0052] FIG. 6 is a drawing disclosing another example in which a first CTI device and a second CTI device are interconnected as an embodiment of cyber threat information processing devices according to an embodiment.

[0053] FIG. 7 discloses embodiments of a cyber threat information processing device according to an embodiment.

[0054] FIG. 8 discloses an example of active ASM execution based on network traffic collection according to an embodiment.

[0055] FIG. 9 discloses an example of providing vulnerability details and measures identified by ASM technology according to an embodiment.

[0056] FIG. 10 discloses an embodiment of a method for processing cyber threat information according to an embodiment.

[0057] FIG. 11 is a drawing disclosing another embodiment of a cyber threat information processing device according to an embodiment.

[0058] FIG. 12 is a drawing disclosing an embodiment of a cyber threat information processing method according to an embodiment.

[0059] FIG. 13 is a diagram disclosing a quantization technique used by a cyber threat information processing method according to an embodiment.

[0060] FIG. 14 is a diagram disclosing a quantization technique used by a cyber threat information processing method according to an embodiment.

[0061] FIG. 15 is a drawing disclosing an embodiment of a cyber threat information processing method according to an embodiment.

[0062] FIG. 16 discloses another example of a cyber threat information processing device that generates artificial intelligence training data capable of responding to cyber threats.

[0063] FIG. 17 is a drawing disclosing an example of providing a malicious code dataset according to an embodiment.

[0064] FIG. 18 discloses an example of processing cyber threat information that can provide a dataset.

[0065] FIG. 19 is a conceptual diagram for conceptually explaining an embodiment disclosed.

[0066] FIG. 20 is a diagram illustrating the procedure for responding to over-detection in cyber threat information processing using a natural language model according to an embodiment disclosed.

[0067] FIG. 21 is a diagram illustrating the mail processing procedure of a natural language model agent (LLM agent) according to an embodiment.

[0068] FIG. 22 is a diagram illustrating the query analysis procedure of a natural language model agent (LLM agent) included in an embodiment.

[0069] FIG. 23 is a diagram illustrating the query response procedure of a natural language model agent (LLM agent) included in an embodiment.

[0070] FIG. 24 is a diagram illustrating the result of automatically processing a false positive response inquiry of a cyber threat information processing system according to the disclosed example.

[0071] FIG. 25 is a diagram showing an example of a system overdetection response according to an embodiment of a method for processing cyber threat information.

[0072] FIG. 26 illustrates a cyber threat information processing device using a natural language model according to the disclosed example.

[0073] FIG. 27 illustrates a procedure in which a first insight generation unit among the disclosed cyber threat information processing devices generates insight information.

[0074] FIG. 28 discloses an example in which the first insight generation unit exemplified above detects anomalies regarding an attack group's campaign.

[0075] FIG. 29 discloses an example of prompt generation that can generate information using the natural language model exemplified above.

[0076] FIG. 30 illustrates a procedure for generating insight information of the second insight generation unit among the disclosed cyber threat information processing device.

[0077] FIG. 31 discloses another example of prompt generation that can generate information using the natural language model exemplified above.

[0078] Figure 32 illustrates a headline news generated using the natural language model exemplified above.

[0079] FIG. 33 discloses an example of processing cyber threat information in which news can be automatically provided using statistical data insights.

[0080] FIG. 34 discloses embodiments of a cyber threat information processing device according to an embodiment.

[0081] FIG. 35 discloses an example of a data collection preparation process according to an embodiment.

[0082] FIG. 36 discloses an example of a data collection and extraction process according to an embodiment.

[0083] FIG. 37 discloses an example of a data preprocessing and integration process according to an embodiment.

[0084] FIG. 38 discloses an example of a process for generating threat analysis information based on a multi-agent according to an embodiment.

[0085] FIG. 39 discloses an example of the output value and threat analysis information of each agent according to the embodiment.

[0086] FIG. 40 discloses an example of the output value and threat analysis information of each agent according to an embodiment.

[0087] FIG. 41 discloses an embodiment of a method for processing cyber threat information according to an embodiment.

[0088] FIG. 42 discloses embodiments of a cyber threat information processing device according to an embodiment.

[0089] FIG. 43 discloses an example of a file upload screen according to an embodiment.

[0090] FIG. 44 discloses an example of a file analysis result screen according to an embodiment.

[0091] FIG. 45 discloses an example of a session list screen according to an embodiment.

[0092] FIG. 46 discloses an example of a session list detail screen according to an embodiment.

[0093] FIG. 47 discloses an example of a file list screen according to an embodiment.

[0094] FIG. 48 discloses an example of a file analysis information screen according to an embodiment.

[0095] FIG. 49 discloses an example of a file analysis detail screen according to an embodiment.

[0096] FIG. 50 discloses an embodiment of a cyber threat information processing method according to an embodiment.

[0097] Hereinafter, embodiments will be described in detail with reference to the attached drawings.

[0098] In the embodiments, the engine, various analysis tools, modules, etc., may be implemented as a physical device, a device combined with the physical device, or software.

[0099] When an embodiment is implemented as software, it may be stored on a non-volatile storage medium executable by a computer and installed on a computer, etc., and executed by a processor.

[0100] Examples of cyber threat information processing devices and cyber threat information processing methods are disclosed in detail as follows.

[0101] In wired and wireless network communication between two or more devices, various types of cyber threat information at different network levels can cause complex abnormal behaviors in said devices simultaneously or at different times. These complex cyber threats and abnormal behaviors are referred to as cyber threat campaigns below.

[0102] In the disclosed embodiment, two or more different types of cyber threat information processing devices may be included. Therefore, for convenience in the embodiment, N cyber threat information processing devices are referred to as the Nth cyber threat intelligence (CTI) device.

[0103] The disclosed embodiment detects and analyzes cyber threat information included in network communication, and based on this, can analyze cyber threat information in more detail through inter-device cooperation, or explain the analyzed results to the user in a very easy way, or enable response or prediction.

[0104] The CTI device of the following embodiment may be implemented as a physical device connected to a wired or wireless communication network, or may be implemented according to the same characteristics and principles in a network-connected device such as an artificial satellite or a spacecraft. Additionally, it may be a directly connected device equipped with a small storage device and connected to a network, such as a network black box or a camera device.

[0105]

[0106] FIG. 1 discloses an embodiment of a method for processing cyber threat information according to an embodiment.

[0107] One embodiment of the disclosed cyber threat information processing method can collect, analyze, and detect data based on communication network traffic, manage cyber threat information from the results, and respond to cyber threats.

[0108] In this embodiment, the first CTI device is assumed to be a device included in a client system that detects and analyzes cyber threat information on network communication, and the second CTI device is exemplified as a device that provides platform-based services based on a computing server and a database in which cyber threat information is analyzed.

[0109] A first CTI device in the client system analyzes data or application data according to a protocol included in network traffic (S110).

[0110] The first CTI device can collect network traffic, classify layered data according to OSI layers, and analyze whether there is cyber threat information based on protocols or applications.

[0111] The first CTI device transmits a query request for cyber threat information related to the analyzed data to the second CTI device (S120).

[0112] The first CTI device can obtain additional detailed cyber threat information by making a query request to the second CTI device regarding the cyber threat information analyzed primarily as above.

[0113] The second CTI device can further analyze cyber threat information based on a query request analyzed by the first CTI device, or generate explanatory information based on artificial intelligence natural language processing regarding the analyzed cyber threat information.

[0114] The client system receives additional analysis results and explanatory information regarding cyber threat information in response to the query request from the second CTI device and provides them to the user (S130).

[0115] The client system can obtain analysis results and explanatory information regarding cyber threat information analyzed by the first CTI device or additionally analyzed by the second CTI device.

[0116] The client system may provide the user with additional analysis results and explanatory information regarding the received cyber threat information. When providing this information to the user, the user may be provided with cyber threat information related to abnormal behavior, malicious behavior, attack behavior, etc., through the monitoring unit of the client system.

[0117] The client system can obtain detailed analysis results and natural language-based explanatory information regarding what the cyber threat information is from the second CTI device.

[0118] Accordingly, users can take response or preventive measures regarding the analyzed cyber threat information.

[0119] Examples of a first CTI device and a second CTI device for collecting network traffic and analyzing cyber threat information are described below.

[0120]

[0121] FIG. 2 discloses another embodiment of a cyber threat information processing method according to an embodiment.

[0122] The second CTI device receives a request for analysis or a request for query regarding cyber threat information related to data included in network traffic from the first CTI device (S210).

[0123] Here, the second CTI device is a natural language model that explains a query for CTI information in natural language and provides the basis for the explanation, or may include a natural language model. Detailed examples of natural language models are disclosed below.

[0124] If the second CTI device is a data platform including a natural language model exemplified below, the second CTI device may receive a request for analysis or a query request for cyber threat information from a client system or from a first CTI device included in the client system via an API.

[0125] The first CTI device can transmit additional analysis information or query requests regarding cyber threat information analyzed directly from network traffic.

[0126] The second CTI device can generate detailed analysis results and explanatory information regarding cyber threat information in accordance with the analysis request or query request (S220).

[0127] The second CTI device can convert files of various formats requested for analysis into binary data or analyze attack activities, attackers, etc., regarding cyber threat campaigns through feature analysis.

[0128] The second CTI device can classify cyber threat information, such as attack behaviors and attackers, or generate detailed analysis results through an artificial intelligence (AI) engine based on the characteristics of the analyzed data.

[0129] The second CTI device can generate natural language description information for a query request of the analyzed cyber threat information based on an internal natural language model.

[0130] The second CTI device can provide the detailed analysis results and explanatory information generated above to a client system or user (S230).

[0131] An example of a second CTI device for cyber threat information through artificial intelligence processing is described below.

[0132]

[0133] FIG. 3 discloses embodiments of a cyber threat information processing device according to an embodiment.

[0134] One embodiment disclosed may include a client system (10) and a second CTI device (2000).

[0135] The client system (10) may include a client device (100) and a first CTI device (1000).

[0136] The client system (10) can receive network traffic from the Internet through the first CTI device (1000).

[0137] The second CTI device (2000) may include a computing server (2800) and a database (2700), a framework (2200) that provides an application programming interface, and an artificial intelligence processing unit (23000).

[0138] Unless the client system (10) and the second CTI device (2000) are connected via an intranet or the like, the second CTI device (2000) may include a separate first CTI device (1010) and receive network traffic from the Internet through it.

[0139] The first CTI device (1000) in the client system (10) can analyze layered protocol data and application data transmitted through the network traffic to obtain metadata related to the protocol and executable or non-executable files within the payload.

[0140] The first CTI device (1000) can analyze the data or files received as described above to detect or extract cyber threat information and provide it to a monitoring system (not shown) connected to the client device (100).

[0141] Alternatively, the first CTI device (1000) may request the second CTI device (2000) to perform additional analysis of the cyber threat information analyzed as above, or request a query related to the cyber threat information.

[0142] Separately, the client device (100) may request the second CTI device (2000) to perform additional analysis on cyber threat information of files or metadata transmitted from an external network through the first CTI device (1000), or request a query related to said cyber threat information.

[0143] The second CTI device (2000) can receive a file, metadata, or cyber threat information (CTI) or a query (hereinafter CTI query) about cyber threat information transmitted by the client device (100) or the first CTI device (1000) through an application programming interface (API).

[0144] Here, the query for cyber threat information may include, for example, whether it is malicious, the hash value of the file, assembly code or information on functions included in the assembly code, and other information related to the file.

[0145] Files received from multiple modules can be analyzed within the framework (2200) of the second CTI device (2000). Here, the multiple modules are simplified and represented as the M module (2210) or the N module (2220).

[0146] For example, the M module (2210) or the N module (2220) of the framework (2200) can perform malicious behavior analysis on network layer metadata, executable files, non-executable files, or web data collected from the internet.

[0147] Meanwhile, the query module (2230) of the framework (2200) transmits CTI queries related to files and metadata transmitted by the client system (10) to the natural language model (2320) of the artificial intelligence processing unit (2300), and CTI feature analysis requests to the AI ​​engine (2310) of the artificial intelligence processing unit (2300).

[0148] The M module (2210) or the N module (2220) can transmit to the query module (2230) analysis information regarding cyber threat information (CTI) or files related to the CTI query queried by the client (100). For example, the M module (2210) or the N module (2220) can transmit to the query module (2230) information regarding whether the analyzed file or data is malicious, attack behavior, attack technique, attack group, or attack campaigns in which multiple attack behaviors are linked.

[0149] The query module (2230) can generate a CTI query for a file, metadata, or related cyber threat information (CTI) submitted by the client (100) or the first CTI device (1000).

[0150] The query module (2230) can generate appropriate CTI queries related to network protocol or application data, files, metadata, and cyber threat information analyzed from the files or data (e.g., information on whether it is malicious, attack behavior, attack technique, attack group, or attack campaign involving multiple attack behaviors) analyzed by the M module (2210) or the N module (2220).

[0151] Then, the natural language model (2320) of the artificial intelligence processing unit (2300) can generate a natural language answer to a CTI query based on network protocol / application data, files, and various cyber threat information (CTI) analyzed by the M module (2210) or the N module (2220), etc.

[0152] The first CTI device (1000) transmits protocol data or application data in network traffic, files in the payload, metadata and related CTI queries to the second CTI device (2000).

[0153] The client device (100) transmits a file transmitted from the first CTI device (1000) and a CTI query related to the file, etc. to the second CTI device (2000).

[0154] The framework (2200) of the second CTI device (2000) can perform cyber threat information (CTI) analysis on protocol data or application data, files, metadata, etc., and extract CTI features.

[0155] The CTI features analyzed and extracted from the framework (2200) generate more accurate CTI analysis results or CTI prediction results in the AI ​​engine (2310) of the artificial intelligence processing unit (2300). The generated CTI analysis results or CTI prediction results are transmitted to the client system (10) through an application programming interface (API).

[0156] A CTI query related to a CTI feature output from the framework (2200) is generated as a result of natural language analysis by the natural language model (2320) of the artificial intelligence processing unit (2300). The natural language query response to the generated CTI query is transmitted to the client system (10) through an application programming interface (API).

[0157] Along with the CTI analysis results generated in this way, the natural language model (2320) provides the natural language CTI query answer generated by the natural language CTI query answer to the client system (10).

[0158] The CTI inquiry response includes a natural language description of whether the cyber threat information (CTI) inquired about is malicious, the attack behavior, the attack technique, the attack group, or the attack campaign in which multiple attack behaviors are linked, in relation to data analyzed and extracted from network traffic by the client system (10), particularly the first CTI device (1000).

[0159] In addition, regarding binary data such as assembly code of a file included in network traffic and functions included in that data, explanation information on whether it is malicious can be provided based on the results analyzed by the second CTI device (2000).

[0160] The framework (2200) of the second CTI device (2000) can provide various analysis information about a file or information that has been analyzed and stored in the database (2700), and can also generate or suggest various additional CTI queries related to the CTI query of the client system (10) to the user.

[0161] And, the natural language model (2320) of the second CTI device (2000) generates natural language descriptions for CTI queries submitted by the client system (10) and additional CTI queries provided by the framework (2200), and provides natural language answers related to the CTI queries to the user of the client system (10).

[0162] In the embodiment, since the second CTI device (2000) analyzes or provides previously analyzed information along with natural language, according to the embodiment, even if the user is a non-expert, easy and accurate information delivery and response to cyber threat information is possible.

[0163] When a server (2800) providing a second CTI device (2000) is connected to the Internet through a separate first CTI device (1010), the function of the first CTI device (1010) is the same as the function of the disclosed first CTI device (1000).

[0164] Embodiments of a second CTI device (2000) including a first CTI device (1000 or 1010) and a natural language model (2320) are disclosed in detail below.

[0165]

[0166] FIG. 4 illustrates a first CTI device as an embodiment of a cyber threat information processing device according to an embodiment.

[0167] An embodiment of the first CTI device (1000) disclosed includes a collection unit (1100) that collects transmission data from mirrored network traffic, an analysis unit (1200) that analyzes data according to a protocol or application within the collected data, and a detection unit (1300) that detects cyber threat information from the analyzed data.

[0168] The data analyzed by the first CTI device (1000) is provided to the user of the client system (10) through the monitoring unit (1800), and the user can monitor abnormal behavior or threat behavior on the network.

[0169] The collection unit (1100) includes a packet collector (1120), and the packet collector (1120) can collect various metadata for security enhancement from the collected packet data without loss.

[0170] By the packet collector (1120) of the collection unit (1100) collecting various metadata from the packets, the analysis unit (1200) can efficiently process multiple packets and pre-allocate the packets in memory so that unnecessary waiting time is not taken.

[0171] Generally, the packet processing speed of an operating system often fails to keep up with the processing speed of a Network Interface Card (NIC). In one embodiment, packet processing performed in the kernel of an operating system can be processed in user space using a high-speed processing library.

[0172] One embodiment of the collection unit (1100) allows a process of the operating system to use a high-speed processing library to poll packets received on a network interface card without using the kernel, thereby reading data at high speed. Accordingly, the embodiment of the collection unit (1100) can reduce the idle time that occurs during the process in which the kernel of the operating system reads packets received on the network interface card and transmits them to the process of the operating system.

[0173] In the disclosed embodiment, the collection unit (1100) may include a network interface card (not shown) and a packet collector (1120).

[0174] A network interface card (not shown) can receive network packets at high speed.

[0175] The packet collector (1120) of the collection unit (1100) reads the received packets by polling without passing through the kernel and transmits them to the processor.

[0176] The receiving core within the packet collector (1120) of the collection unit (1100) can store packets received at high speed via polling in a large-capacity memory. The receiving core of the collection unit (1100) uses a dedicated core of the processor to perform isolated tasks so as to protect against malicious software, etc., included in the high-speed received packets affecting related processes.

[0177] The receiving core of the collection unit (1100) can protect the computer operating system and enhance security functions even when receiving packets at high speed. The memory within the collection unit (1100) can use large-capacity memory that can reduce management overhead, such as memory faults, depending on the management function. The memory of the collection unit (1100) may or may not be set as large-capacity memory depending on the settings of the operating system.

[0178] The copy core of the collection unit (1100) can copy and output packets stored in memory.

[0179] The analysis unit (1200) can inspect, manage, and filter the packets collected by the collection unit (1100) using the Deep Packet Inspection (DPI) method. The DPI method can examine the contents of the data within the collected packets in detail up to the OSI Layer 7 level. Through this, the DPI method can not only identify the overall characteristics of the network data but also control potential and malicious traffic.

[0180] A DPI engine (1210) that performs the DPI method monitors the collected packets between the source and the destination, reassembles them, and inputs them into a separate buffer. The DPI engine (1210) can form a single session with the input packets and generate metadata based on it.

[0181] The packets analyzed by the DPI engine (1210) of the analysis unit (1200) are stored in the queue storage unit (1220) according to their content and then output.

[0182] The queue storage unit (1220) of the analysis unit (1200) can synchronize memory access by completing the calls in fixed time units when data is called simultaneously by multiple threads.

[0183] The analysis unit (1200) can solve problems related to system synchronization by using a multi-threaded environment when the data being analyzed is in various cases, such as metadata, files, or PCAP packet files, by utilizing multiple queue storage units (1220).

[0184] The DPI engine (1210) of the analysis unit (1200) pre-allocates the internal memory (1211) of the analysis unit (1200) to store data and stores the data output by the collection unit (1100).

[0185] The DPI engine (1210) of the analysis unit (1200) can analyze the syntax in detail in real time for the data stored in the internal memory (1211) and extract the file.

[0186] The DPI engine (1210) of the analysis unit (1200) can extract metadata of data across all layers, including layers L2 to L4 as well as the application layer of layer 7. The data extracted by the DPI engine (1210) of the analysis unit (1200) is as follows.

[0187] For example, the DPI engine (1210) of the analysis unit (1200) can obtain not only transport layer information such as the Internet protocol or TCP / UDP of the Source IP and Destination IP from the packet header, but also application layer information within the packet payload.

[0188] The DPI engine (1210) of the analysis unit (1200) can extract metadata of application protocols required for network threat detection, such as HTTP, SSL, SSH, FTP, SMB, DNS, and metadata related to content, such as web pages, filenames, User Agent Strings, JavaScript, and images.

[0189] And the DPI engine (1210) of the analysis unit (1200) can also extract metadata for OT protocols such as industrial application protocols or engineering protocols.

[0190] For example, metadata can also be generated for MODBUS, a network communication encapsulated in the TCP payload; DNP3, widely used in the energy sector; and BACnet and KNX protocols, primarily used in smart buildings.

[0191] The core engine (1212) within the analysis unit (1200) can separate data transmitted according to a layer or industry protocol within a packet according to type and transmit it to the queue storage unit (1220) according to the type of data. The queue storage unit (1220) within the analysis unit (1200) is arranged in parallel according to the data, so that horizontal scalability is possible according to the data type and scale.

[0192] In this way, the analysis unit (1200) can secure visibility into data of all layers according to the packet structure according to the characteristics of the network protocol.

[0193] The analysis unit (1200) can extract data and metadata on the protocols of the IT network and the OT network.

[0194] The analysis unit (1200) can classify data according to the protocol and classify metadata and files accordingly, and can convert data of the same source / destination into a file of a single session PCAP packet and then generate and store metadata according to the PCAP packet.

[0195] The detection unit (1300) can detect threat elements based on metadata added by the analysis unit (1200), files within the packet, and relocated PCAP packet files. The detection unit (1300) can detect event characteristics such as file types, attack behavior types, and OT types as abnormal behavior through profiling.

[0196] For example, the detection unit (1300) can detect malware using at least one of the Indicator of Compromise (IoC), rule-based malware detection tools such as YARA rules, and machine learning.

[0197] The detection unit (1300) can detect abnormal behavior through various rules and AI-based behavioral analysis that can identify the attacker's tactics, techniques, and procedures (TTP) according to the attack lifecycle.

[0198] The detection unit (1300) can detect anomalies in an operational technology (OT) environment within a corporate network.

[0199] The detection unit (1300) can detect risk factors by loading extracted event features and applying an AI algorithm to generate a behavior profile, and by calculating a risk score using weights based on the confidence score of the generated behavior profile model.

[0200] For example, the detection unit (1300) can identify threat elements by accumulating scores based on correlation and statistical analysis regarding each threat element of metadata, files, and PCAP packet files, and store them in respective databases (1310, 1320, 1330).

[0201] Through this stepwise and comprehensive method, the detection unit (1300) can reduce the false positive rate and enable the user to efficiently investigate and respond to risk factors.

[0202]

[0203] The detection unit (1300) can identify and detect threat elements from the data analyzed by the protocol data analysis unit (1200) and provide the results to the monitoring unit (1800).

[0204] When the detection unit (1300) identifies a threat element from abnormal events in the metadata extracted or generated as above and the data within the packet's payload, it can perform highly reliable threat element detection by evaluating contextual information regarding whether there is a threat and the threat level based on correlation analysis.

[0205] The detection unit (1300) can perform malware detection, behavior analysis-based detection, and OT anomaly detection from input network traffic.

[0206]

[0207] (a) Malware detection

[0208] The detection unit (1300) can detect malware as known malware by using Indicators of Compromise (IoC) from network traffic. The detection unit (1300) can also detect unknown malware using machine learning techniques.

[0209] The detection unit (1300) can identify attackers and attack activities regarding unknown files by disassembling and converting files within network traffic into binary data and learning malicious files and normal files through machine learning. The detection unit (1300) can detect malicious code using a learning model based on a Random Forest algorithm based on the characteristics of the file's binary data.

[0210] And advanced persistent threats (APTs) can be identified based on defined rules such as YARA rules.

[0211] The detection unit (1300) can classify a predefined signature as malicious code based on a defined rule-based string or a binary pattern (Hex string). The threat detection unit (1350) can identify malicious code by specifying a specific entry point value or by using pattern matching based on regular expressions such as a file offset or a virtual memory address.

[0212]

[0213] (b) Behavior analysis-based detection

[0214] The detection unit (1300) can detect attack tactics, techniques, and procedures (TTPs) based on behavioral analysis of data included in network traffic.

[0215] The detection unit (1300) can detect attack behavior based on behavioral analysis through threat detection according to multiple behavioral rules. The detection unit (1300) applies various AI-based anomaly detection techniques to many features extracted from network traffic. The threat detection unit (1350) can evaluate whether network traffic is anomaly by generating hundreds of profiled anomaly models through entity modeling by device / peer group / network level.

[0216] The detection unit (1300) can evaluate whether there is an anomaly by comparing the extracted features with the device's past patterns (device modeling), evaluating distinctiveness within a cluster (peer group modeling), investigating sparsity throughout the network, and calculating an anomaly score using an anomaly model.

[0217] The detection unit (1300) can detect threat elements in an abnormal model by calculating a threat score for one or more abnormal events through a threat detector.

[0218]

[0219] (c) OT Anomaly Detection

[0220] The detection unit (1300) exists to manage operations, particularly physical operations in various industrial sectors that have benefited from automation and mechanization, in an OT environment designed to maintain safety, uptime, and productivity. The detection unit (1300) can detect threat factors for anomaly detection in an OT environment designed to maintain safety, uptime, and productivity using whitelist-based anomaly detection technology and ML-based anomaly detection technology for process values ​​of time series.

[0221] The detection unit (1300) can detect threat elements based on a whitelist and a time series of sensors.

[0222] When the detection unit (1300) analyzes whitelist-based data, it can detect threats of communication in a malformed format or application misuse by extracting the command field of the protocol included in the data. Additionally, the detection unit (1300) can understand the specialized meaning for each OT protocol, map the detailed message field for each command and the request and response messages into a pair of sessions, analyze them, and select allowed packets based on statistics.

[0223] When the detection unit (1300) detects a time-series-based threat element of the sensor, it can determine whether there is an anomaly by configuring a specific process value extracted from the packet into a time series and comparing it with a model trained by machine learning.

[0224] The detection unit (1300) allows the manager to selectively perform a preliminary test on a model created by machine learning a specific process to verify accuracy performance.

[0225]

[0226]

[0227] *178(d) Correlation Analysis

[0228] When the detection unit (1300) detects a threat element by performing malware detection, behavioral analysis-based detection, and OT anomaly detection on the metadata of the packet and the data of the payload, it can identify whether the detected threat element is an actual threat technology through correlation analysis.

[0229] The detection unit (1300) may include multiple threat detectors for correlation analysis. The multiple threat detectors can perform multiple artificial intelligence (AI)-based anomaly detections and identify threat technologies by performing correlation analysis on various contexts using defined rules.

[0230] Meanwhile, an embodiment of the first CTI device may further include an intelligence processing unit (1400).

[0231] The intelligence processing unit (1400) can receive an executable or non-executable file included in the payload of a packet from the analysis unit (1200). The intelligence processing unit (1400) can also receive files, metadata, applications, etc. analyzed from network traffic from the detection unit (1300).

[0232] The intelligence processing unit (1400) can transmit the executable file or non-executable file, or the analyzed metadata thereof, to the cyber threat intelligence system when it intends to detect and identify detailed cyber threat information based on an executable file or a non-executable file.

[0233] One embodiment of the intelligence processing unit (1400) may include analysis modules included in the framework of the second CTI device embodiment disclosed above and an AI engine of the artificial intelligence processing unit. In this case, the intelligence processing unit (1400) may analyze attack behaviors, attackers, campaigns, etc. included in metadata, files, and PCAP packets, etc. detected by the detection unit (1300) and classify them using the AI ​​engine.

[0234] A cyber threat intelligence system can identify attack tactics, techniques, and procedures (TTPs) for a received file and provide profiling results such as the attacker of an Advanced Persistent Threat (APT) and identifiers of attack behaviors (including attack behavior identifiers based on the MITER ATT&CK Matrix).

[0235] Alternatively, an embodiment of the first CTI device may request that the collected and analyzed data be transmitted to the second CTI device for processing in connection with the intelligence profiling processing described above.

[0236] Another embodiment of the intelligence processing unit (1400) includes analysis modules included in the framework of the embodiment of the second CTI device disclosed above and an AI engine of the artificial intelligence processing unit, and may further include a natural language model.

[0237] In such cases, an embodiment of the first CTI device may generate explanatory information of profiled CTI information or CTI query answers in an internal artificial intelligence-based natural language model without the need to query the second CTI device for the data analyzed in relation to the intelligence profiling processing described above.

[0238] When an embodiment of the first CTI device includes a natural language model, an example in which the natural language model generates explanatory information or CTI query answers for CTI information detected by the first CTI device may follow an example of a natural language model disclosed below.

[0239] Although not shown in this drawing, the first CTI device may further include a threat information management unit (not shown) that provides visualization information about threat information for monitoring threat information detected by the detection unit (1300).

[0240]

[0241] Based on this analysis information and natural language model, the threat information management unit (not shown) of the first CTI device can derive risk factors for assets including network-connected IT assets, OT infrastructure, and IoT devices, and produce protection measures and visualization information.

[0242] The threat information management unit (not shown) of the first CTI device provides a means to build and monitor management information of various assets associated with network traffic.

[0243] For example, the threat information management unit (not shown) of the first CTI device can build a list of managed assets and detailed information related to threat information. The first CTI device can build the IP / MAC address, vendor and type information, model serial information, and firmware information of each asset, and monitor the software version.

[0244] The threat information management unit (not shown) of the first CTI device can build a network map of managed assets and provide visualized information through the monitoring unit (1800).

[0245] The threat information management unit (not shown) of the first CTI device can identify vulnerabilities of each managed asset and provide the corresponding vulnerability information through the monitoring unit (1800).

[0246] In an embodiment of this drawing, the first CTI device may include a database for storing data and a processor for processing network data.

[0247] A processor in the first CTI device can process instructions that analyze data or application data according to a protocol within network traffic, transmit a query request for cyber threat information related to the analyzed data to a natural language model, and receive and provide detailed analysis results and explanatory information regarding cyber threat information according to the query request from the natural language model.

[0248] The first CTI device may be implemented as software that stores and executes computer-executable commands as described above.

[0249]

[0250] FIG. 5 discloses an example in which a first CTI device and a second CTI device are interconnected as an example of cyber threat information processing devices according to an embodiment.

[0251] The application programming interface (API) (2100) of the second CTI device (2000) can receive a file, a request for cyber threat information (CTI) analysis related to the file, or a query related to CTI from the client system (10).

[0252] The framework (2200) of the Application Programming Interface (API) (2100) may include multiple analysis modules or prediction modules. For example, the framework (2200) disclosed above may perform static analysis, dynamic analysis, deep analysis, mild-dynamic analysis, etc., according to an input file using an AI engine. Here, any module that performs such analysis or prediction is indicated as the Nth module (1219).

[0253] When the framework (2200) receives a file from the client system (10), it can obtain binary data at the assembly level through disassembly. Based on this, the framework (2200) can perform analysis of functions related to whether they are malicious, analysis of attack behavior or attack techniques, and analysis of attack groups, and analysis of a sequence of binary data blocks (hereinafter referred to as instruction sequences) according to the call relationships of functions included in the binary data.

[0254] The framework (2200) can analyze whether the input file is a non-executable file such as a document file, whether the file is malicious, the attack act or attack technique, and the attack group.

[0255] The server (2800) collects web pages on the internet by performing crawling, whether on-premises server or cloud server, and the framework (2200) can analyze whether the collected web pages are malicious, attack behavior or attack technique, and attack group.

[0256] The database (2700) can classify and store results analyzed by the framework (2200) of the second CTI device (2000), such as assembly code functions that appear during the process of analyzing files, whether the functions are malicious, hash codes, instruction sequences, static analysis, dynamic analysis, mild-dynamic analysis, predictive analysis results, whether the partial tags of web pages are malicious, attack techniques corresponding to MITRE ATT&CK, information about attack behaviors and attack groups, attack campaigns related to files, attack countries, attack industries, etc.

[0257] Meanwhile, the query module (2230) of the framework (2200) transmits the CTI natural language query to the natural language model (2320) of the artificial intelligence processing unit (2300) when the client system (10) makes a request for analysis of cyber threat information (CTI) regarding a specific file, webpage, etc.

[0258] The natural language model (2320) may be a natural language model (NLP), a large language model (LLM), or a language model based on Transformer technology, or it may be a smaller large language model (sLLM) related to cyber threats or security.

[0259] A request for CTI analysis or prediction related to a file of the client system (10) may be made, or a general natural language CTI query unrelated to the file may be requested. Accordingly, the query module (2230) generates a CTI query or supplementary query based on the cyber threat information (CTI) analyzed by the framework (2200) and transmits it to the natural language model (2320).

[0260] If the client system (10) requests a CTI query unrelated to a file, the query module (2230) transmits the CTI query to the natural language model (2320).

[0261] The CTI query language processing unit (2321) can analyze the CTI query using the parsing technique included in the CTI query.

[0262] The CTI query processed by the CTI query language processing unit (2321) is transmitted to the CTI query interpretation unit (2323).

[0263] The CTI query interpretation unit (2323) can perform the function of distinguishing questions based on the sentence structure and meaning of the CTI query processed by the CTI query language processing unit (2321), and recognizing sub-question types and relationships between sub-questions.

[0264] The CTI query interpretation unit (2323) may include a CTI query decomposition unit (2324) and a CTI query analysis unit (2325).

[0265] The CTI query decomposition unit (2324) can perform the function of distinguishing questions based on the sentence structure and meaning included in the CTI query, classifying sub-question types, and recognizing relationships between classified sub-questions.

[0266] The CTI query analysis unit (2325) can classify the types of the separated sub-questions. And the CTI query analysis unit (2325) can recognize the core of the question based on the reliability of the words or phrases that can be replaced by candidate answers, according to the classified types of the sub-questions.

[0267] If the CTI query analysis unit (2325) has a reliability that cannot recognize the core of the question, the CTI query decomposition unit (2324) may be made to reclassify the sub-question types.

[0268] Through the repeated processing of the CTI query decomposition unit (2324) and the CTI query analysis unit (2325) as described above, the CTI query analysis unit (2325) can detect and verify the topic of the CTI-related question.

[0269] The CTI question and answer generation unit (2326) can generate all possible answer candidates from structured or unstructured resources based on CTI questions and question classification information. The CTI question and answer generation unit (2326) may include a CTI answer candidate group generation unit (2327), a CTI answer verification unit (2328), and a CTI answer provision unit (2329).

[0270] The CTI answer candidate generation unit (2327) can perform indexing and search functions from a database (2700) containing cyber threat information (CTI) and generate candidate answers based on the search results. The CTI answer candidate generation unit (2327) generates all possible answer candidates from a database containing cyber threat information (CTI) based on questions and question classification information. Here, the database containing cyber threat information (CTI) includes the database (2700) of the second CTI device (2000). The CTI answer candidate generation unit (2327) may also collect evidence regarding the answer candidates from the database (2700) containing cyber threat information (CTI). This will be described below.

[0271] The CTI Answer Verification Unit (2328) performs the functions of the answer inference and generation module and can determine and generate the best answer. The CTI Answer Verification Unit (2328) determines the ranking of the answer candidates by measuring the reliability of the answer candidates by characterizing the filtered answer candidates and the inferred answer candidates.

[0272] The CTI Answer Verification Unit (2328) can filter answer candidates using inductive, deductive, or abductive reasoning based on the similarity between the query and the answer candidates. The CTI Answer Verification Unit (2328) can then select the optimal CTI answer by re-ranking the answer candidates by comparing the confidence ratio of the answer candidates with a threshold value.

[0273] The CTI answer providing unit (2329) transmits the CTI answer verified by the CTI answer verification unit (2328) to the second CTI device (2000) to provide natural language explanation information for the CTI question answer.

[0274] When a client system (10) queries cyber threat information (CTI) together with or separately from a request for cyber threat information (CTI) related to a file, the second CTI device (2000) may provide information about information related to the CTI file (whether it is malicious, hash value, attack technique, attack group, attack campaign, etc.), a natural language description thereof, and evidence collected as the basis thereof.

[0275] For example, when a client system (10) queries the result of an analysis request for a specific file, information regarding which MITRE ATT&CK attack technique by which attack group the malicious activity caused by the file is connected to, and which attack campaign (a series of mechanisms of one or more attacks) can be provided as visualization information as exemplified above. In addition, the second CTI device (2000) can provide a natural language explanation generated by a natural language model along with the visualization information, and can provide valid digital analysis evidence for the analysis result and natural language explanation analysis evidence for the digital analysis evidence.

[0276] When a client system (10) queries cyber threat information (CTI) without regard to files, it may provide an answer to the CTI query, a natural language description of the CTI query generated by a natural language model, and evidence collected as the basis therefor.

[0277] The second CTI device (2000) can provide the client system (10) with the cyber threat information (CTI) analyzed or predicted by the framework (2200) and the natural language answer or explanatory information for the query of the cyber threat information (CTI) provided by the natural language model (2320).

[0278] The physical device (2000), which is a computing device and is the second CTI device (2000), may include a database (2700) and a server (2800) including a processor.

[0279] A processor driving the second CTI device (2000) can receive a request for cyber threat information (CTI) analysis regarding data related to a file from a client, analyze the requested cyber threat information (CTI), and transmit a first cyber threat information (CTI) query generated based on the analyzed cyber threat information (CTI) to a natural language model (2320).

[0280] And the processor driving the second CTI device (2000) can provide the analyzed cyber threat information (CTI) and the descriptive information of the analyzed cyber threat information (CTI) generated by the natural language model (2320).

[0281] When a processor driving a second CTI device (2000) receives a second cyber threat information (CTI) query from a client, it can transmit the second cyber threat information (CTI) query to a natural language model and provide explanatory information about the cyber threat information (CTI) query generated by the natural language model.

[0282] The operation performed by the above physical device may also be executed by a program that implements the embodiment in software.

[0283] As disclosed in the example, protocol data, application data, files, metadata, and related CTI queries analyzed by the first CTI device (1000) from network traffic, or files and related CTI queries received by the client device (100) are transmitted to the second CTI device (2000).

[0284] The second CTI device (2000) can analyze the CTI features of a file in several modules within the framework (2200) as exemplified. Separately, the client device (100) can also send a CTI query related to a file or file to the second CTI device (2000).

[0285] The CTI features analyzed in the framework (2200) of the second CTI device (2000) are transmitted to the AI ​​engine (2310) of the artificial intelligence processing unit (2300).

[0286] The AI ​​engine (2310) of the artificial intelligence processing unit (2300) can classify additional features regarding the CTI features analyzed in the framework (2200), such as functions of related assembly code, whether the functions are malicious, hash codes, instruction sequences, whether the partial tags of web pages are malicious, attack techniques corresponding to MITRE ATT&CK, information about attack behaviors and attack groups, attack campaigns related to files, attack countries, and attack industries.

[0287] Meanwhile, a CTI query is transmitted to a natural language model (2320) of an artificial intelligence processing unit (2300) to generate descriptive information about CTI features received by the CTI device (2000) or analyzed by multiple modules within the framework (2200).

[0288] The natural language model (2320) of the artificial intelligence processing unit (2300) can generate an answer to the received CTI query and transmit it to the client device (100) or the first CTI device (1000).

[0289] An embodiment of this drawing discloses a case where the first CTI device (1000) does not have a natural language model (2320). When the first CTI device (1000) includes a natural language model (2320), the first CTI device (1000) can generate answers to CTI queries or natural language-based explanatory information using its internal natural language model (2320) and provide them to the user without needing to query the second CTI device (2000) for CTI information detected from network traffic.

[0290]

[0291] FIG. 6 discloses another example in which a first CTI device and a second CTI device are interconnected as an example of cyber threat information processing devices according to an embodiment.

[0292] The application programming interface (API) (2100) of the second CTI device (2000) can receive a file, a request for cyber threat information (CTI) analysis related to the file, or a CTI query related to the CTI from the client system (10).

[0293] The functions of the modules (2220, 2230) within the framework (2200) of the application programming interface (API) (2100) and the crawling function of the server (2800) are as described above.

[0294] The database (2700) can classify and store results analyzed by the framework (2200) of the second CTI device (2000), such as assembly code functions that appear during the process of analyzing files, whether the functions are malicious, hash codes, instruction sequences, static analysis, dynamic analysis, mild-dynamic analysis, predictive analysis results, whether the partial tags of web pages are malicious, attack techniques corresponding to MITRE ATT&CK, information about attack behaviors and attack groups, attack campaigns related to files, attack countries, attack industries, etc.

[0295] Meanwhile, the query module (2230) of the framework (2200) transmits the CTI natural language query to an artificial intelligence-based natural language model (2320) when the client system (10) makes a request for analysis of cyber threat information (CTI). The natural language model (2320) may be a natural language model (NLP) or a large language model (LLM), or it may be a smaller large language model (sLLM) related to cyber threats or security.

[0296] A request for CTI analysis or prediction related to a file of the client system (10) may be made, or a general natural language CTI query unrelated to the file may be requested. Accordingly, the query module (2230) generates a CTI query or supplementary query based on the cyber threat information (CTI) analyzed by the framework (2200) and transmits it to the natural language model (2320).

[0297] If the client system (10) requests a CTI query unrelated to a file, the query module (2230) transmits the CTI query to the natural language model (2320).

[0298] The CTI query language processing unit (2321) can analyze the CTI query using the parsing technique included in the CTI query.

[0299] An example of how the CTI query analysis unit (2325) detects and confirms the topic of a CTI-related question through the iterative processing of the CTI query decomposition unit (2324) and the CTI query analysis unit (2325) was exemplified above.

[0300] The CTI question-answer generation unit (2326) can generate all possible answer candidates from structured or unstructured resources based on CTI questions and question classification information. The CTI question-answer generation unit (2326) may include a CTI answer candidate group generation unit (2327), a CTI answer verification unit (2328), and a CTI answer provision unit (2329).

[0301] The CTI answer candidate generation unit (2327) can perform indexing and search functions from a database (2700) containing cyber threat information (CTI) and generate candidate answers based on the search results. The CTI answer candidate generation unit (2327) generates all possible answer candidates from the database containing cyber threat information (CTI) based on question and question classification information.

[0302] Here, the database containing cyber threat information (CTI) includes the database (2700) of the second CTI device (2000).

[0303] The CTI answer candidate generation unit (2327) may collect evidence for answer candidates from a database (2700) in which cyber threat information (CTI) is stored.

[0304] The CTI answer candidate generation unit (2327) performs indexing and search functions for multiple document files. The CTI answer candidate generation unit (2327) generates candidate answers from an input query using search results from various knowledge databases including the database (2700).

[0305] The CTI answer candidate generation unit (2327) generates all possible answer candidates from various resources, including the database (2700), based on question and question classification information. The CTI answer candidate generation unit (2327) then selects candidate answers based on evidence collected from the resources, using deductive or inductive evidence of the answer type and / or self-evident principles that may constrain the answer. That is, the CTI answer candidate generation unit (2327) can generate answers by verifying answer candidates by collecting evidence for answers from resources including the database (2700) and verifying self-evident principles regarding the context. In this way, the CTI answer candidate generation unit (2327) can search for answers to CTI queries and collect digital evidence or grounds for CTI query answers in the database (2700).

[0306] Since the database (2700) classifies and stores already analyzed cyber threat information (CTI), it can provide search data for generating a candidate group of answers when the CTI answer candidate group generation unit (2327) generates a candidate group of answers. Additionally, the database (2700) can provide evidence or grounds for the answer candidate based on the stored cyber threat information (CTI) when the CTI answer candidate group generation unit (2327) selects an answer candidate from the candidate group of answers.

[0307] The CTI Answer Verification Unit (2328) performs the functions of the answer inference and generation module and can determine and generate the best answer. The CTI Answer Verification Unit (2328) measures the reliability of the answer candidates and determines the ranking of the answer candidates by characterizing the filtered answer candidates and the inferred answer candidates.

[0308] The CTI Answer Verification Unit (2328) can filter answer candidates using inductive, deductive, or abductive reasoning based on the similarity between the query and the answer candidates. The CTI Answer Verification Unit (2328) can then select the optimal CTI answer by re-ranking the answer candidates by comparing the confidence ratio of the answer candidates with a threshold value.

[0309] The CTI answer providing unit (2329) transmits the CTI answer verified by the CTI answer verification unit (2328) to the second CTI device (2000) to provide natural language explanation information for the CTI question answer.

[0310] An example of the second CTI device (2000) providing natural language descriptive information for the requested CTI analysis result and CTI query answer, or providing natural language descriptive information for the CTI query, was disclosed above.

[0311] A physical device (2000), which is a computing device providing a second CTI device (2000), may include a database (2700) and a server (2800) including a processor.

[0312] A second CTI device (2000) can receive a request for cyber threat information (CTI) analysis regarding data related to a file.

[0313] A processor driving the second CTI device (2000) can analyze the requested cyber threat information (CTI) and search the database (2700) for a set of candidate answers for the first CTI query generated based on the analyzed cyber threat information (CTI).

[0314] Based on the above search results, the processor can determine a group of candidates for the above answers and provide a natural language description for the above 1 cyber threat intelligence (CTI) query based on a first candidate (optimal candidate) among the determined group of candidates.

[0315] When a processor driving a second CTI device (2000) receives a second cyber threat information (CTI) query from a client system (10), it can search for a set of candidate answers for the cyber threat information (CTI) query from the cyber threat information (CTI) database. The processor can also provide descriptive information for the cyber threat information (CTI) query generated by the natural language model.

[0316] As in the disclosed embodiment, the second CTI device (2000) can be implemented as a physical device including a database that stores cyber threat information and a computing server that processes data.

[0317] The processor of the computing server can process a set of instructions including instructions for receiving a request for analysis or query regarding cyber threat information related to data included in network traffic from a first cyber threat intelligence (CTI) device, generating detailed analysis results and explanatory information regarding cyber threat information in accordance with said query request, and providing said generated detailed analysis results and explanatory information to a client system.

[0318] It may also be implemented as software that stores executable instructions for a computer that performs the same operations as those performed by a physical device.

[0319]

[0320] FIG. 7 discloses embodiments of a cyber threat information processing device according to an embodiment.

[0321] One embodiment disclosed may include a first CTI device (1000) and a second CTI device (2000).

[0322] The first CTI device (1000) may include a high-speed packet collection engine (1150), a protocol data analysis unit (1250), a threat detection unit (1350), and a threat information management unit (1380). Here, the high-speed packet collection engine (1150) may be described as an example of the collection unit (1100) of the first CTI device (1000), the protocol data analysis unit (1250) may be described as an example of the analysis unit (1200), and the threat detection unit (1350) may be described as an example of the detection unit (1300).

[0323] The high-speed packet collection engine (1150) can collect packet data included in network traffic between a source (SRC) and a destination (DST). In one embodiment, the high-speed packet collection engine (1150) can collect packet data at high speed by polling packets received on a network interface card without using a kernel, using a high-speed processing library.

[0324] The protocol data analysis unit (1250) can analyze packet data and extract flow information for the packet data. In one embodiment, the protocol data analysis unit (1250) can analyze the data included in the packet data according to the protocol or application. In one embodiment, the protocol data analysis unit (1250) can generate metadata corresponding to the protocol or application. In one embodiment, the flow information may include at least one IP and port among the source and destination, a protocol for network traffic, an application, and at least one of the metadata.

[0325] In one embodiment, the flow information may further include at least one of host information and operating system information. Here, the host information may include identification information and version information for a host (e.g., computer, server, device, etc.) including a source and a destination connected to a network. For example, the server may include a server operated by an organization corresponding to the source or destination, for example, a server corresponding to an internal asset.

[0326] In one embodiment, the protocol data analysis unit (1250) can check the status of the port to determine the open ports at the source and destination and the status of the port. Here, the open port can be used for network communication and can be used to determine which application-based service is running.

[0327] In one embodiment, the protocol data analysis unit (1250) can identify which application-based service or protocol is running at the source and destination through the open ports. In one embodiment, the protocol data analysis unit (1250) can determine which server is operating in which version based on flow information.

[0328] In one embodiment, the monitoring targets of the ASM may include internal assets and information about the internal assets, for example, IP and applications when services are operated directly by a server corresponding to the internal assets. In one embodiment, the protocol data analysis unit (1250) may identify vulnerabilities in assets operated by an organization corresponding to at least one of the source and destination, or assets that are not owned by the organization but belong to the organization's infrastructure or supply chain (e.g., cloud). In one embodiment, the monitoring targets of the ASM may include applications when services are operated directly by a server.

[0329] The threat detection unit (1350) can generate vulnerability information corresponding to flow information based on vulnerability-related information included in a predefined vulnerability database. In one embodiment, the threat detection unit (1350) can generate vulnerability information by performing Attack Surface Management (ASM) using flow information based on network traffic collection. According to the present invention, attackable ports or vulnerabilities can be discovered and managed through ASM.

[0330] In one embodiment, the vulnerability information may include at least one of information on whether a vulnerability exists in the flow information, the type of vulnerability, and the content of the vulnerability. For example, the threat detection unit (1350) may determine whether a vulnerability exists in the corresponding port included in the flow information.

[0331] In one embodiment, the threat detection unit (1350) may generate vulnerability information by comparing flow information with an external vulnerability database or a vulnerability database stored in the first CTI device (1000). In one embodiment, the vulnerability database may comply with standards such as CVE (Common Vulnerabilities and Exposures) and may include various vulnerability-related information. In one embodiment, the vulnerability-related information included in the vulnerability database may include a CVE (Common Vulnerabilities and Exposures)-ID (identifier) ​​for the vulnerability type, vulnerability details, and vulnerability severity information. In one embodiment, the vulnerability-related information included in the vulnerability database may include information on various vulnerabilities and risks, including assets that are leaked and exploited.

[0332] The threat information management unit (1380) can input vulnerability information into the natural language model (2320) of the artificial intelligence processing unit (2300) of the second CTI device (2000) to provide vulnerability description information to the user in the form of natural language. In one embodiment, the second CTI device (2000) may be configured separately outside the first CTI device (1000) or may be integrated and configured within the first CTI device (1000). For a detailed description of the second CTI device (2000), refer to the above description.

[0333] In one embodiment, the threat information management unit (1380) may input at least one of vulnerability information and a vulnerability analysis query prompt based on vulnerability information into a natural language model (2320) to provide vulnerability description information to the user in the form of natural language. In one embodiment, the vulnerability analysis query prompt may include content requesting a vulnerability description related to the vulnerability information. In one embodiment, the vulnerability analysis query prompt may include content requesting the creation of an analysis report based on the results of analyzing the vulnerability information and the format of the analysis report. For example, the format of the analysis report may include at least one of an analysis overview, vulnerability status, vulnerability analysis content, source information, destination information, payload analysis content, analysis conclusion, and recommendations for security enhancement.

[0334] In one embodiment, the threat information management unit (1380) may transmit a CTI query including vulnerability information and a vulnerability analysis query prompt to the second CTI device (2000). The CTI query language processing unit of the natural language model (2320) included in the second CTI device (2000) may analyze the vulnerability information and the vulnerability analysis query prompt included in the CTI query using syntactic analysis technology.

[0335] CTI queries processed by the CTI Query Language Processing Unit can be transmitted to the CTI Query Interpretation Unit, which can distinguish questions based on the sentence structure and semantics of the vulnerability analysis query prompts processed by the CTI Query Language Processing Unit and classify the types of the distinguished questions. Additionally, depending on the classified type of the question, the CTI Query Language Processing Unit can identify the core of the question based on the reliability of words or phrases that can be replaced by candidate answers.

[0336] The CTI question-answer generation unit can generate all possible vulnerability answer candidates from structured or unstructured resources based on vulnerability information, vulnerability analysis query prompts, and question classification information. Additionally, the CTI question-answer generation unit can determine and generate the best answer containing vulnerability description information by featuring filtered answer candidates and inferred answer candidates among all vulnerability answer candidates. Additionally, the CTI question-answer generation unit can provide the CTI answer containing the generated vulnerability description information in natural language form to the first CTI device (1000).

[0337] In one embodiment, the threat information management unit (1380) can provide information regarding the vulnerability content and vulnerability countermeasures to the user by visualizing it through the monitoring unit (1800).

[0338] For example, the natural language model (2320) may include various technology-based language models such as natural language processing (NLP), large language model (LLM), and Transformer.

[0339] In one embodiment, when simply providing confirmed vulnerability information, there is a problem in that it is difficult to intuitively determine how the actual vulnerability information affects the vulnerability and how dangerous it is. Therefore, according to the present invention, such vulnerability information is input into a natural language model (2320) so that the user can be provided with a natural language description of how dangerous the vulnerability is and what measures are required.

[0340] In one embodiment, the threat information management unit (1380) can train a natural language model (2320) using vulnerability information and a Question-Answer Instruction generated using the vulnerability information. Here, the Question-Answer Instruction refers to data for the natural language model (2320) to learn, and this data consists of various questions and various natural language answers to those questions based on vulnerability information. In one embodiment, the Question-Answer Instruction may include a dataset for the natural language model (2320) to generate answers to questions.

[0341] In one embodiment, the first CTI device (1000) can use data and applications stored in a separate storage / database under the control of a computing server responsible for data processing. Here, the storage mainly uses a hard disk or SSD to store data, and the database manages structured data and can perform operations such as searching and modifying. At this time, the functions performed by the first CTI device (1000) can be performed by the processor of the computing server.

[0342] According to one embodiment of the present disclosure, vulnerabilities in assets can be identified through the execution of ASM, and security controls can be applied and cybersecurity strategies and policies for the assets can be reinforced. In one embodiment, vulnerabilities in assets can be identified and security controls (e.g., operating system or software patches, etc.) can be applied, security standards for unknown or unmanaged assets can be established and decommissioned, and cybersecurity strategies and policies for the assets can be reinforced, such as removing malicious assets.

[0343]

[0344] FIG. 8 discloses an example of active ASM execution based on network traffic collection according to an embodiment.

[0345] In one embodiment, the first CTI device (1000) can acquire network traffic transmitted by a source and a DST. Additionally, the first CTI device (1000) can generate flow information by performing Depth Packet Inspection (DPI) analysis on packet data included in the acquired network traffic.

[0346] In this case, DPI can deeply analyze packet data in the network to identify protocols or applications related to the packet data or collect metadata. Based on the protocols, applications, or metadata collected through DPI, security vulnerabilities, malicious activity, abnormal traffic, and the like can be detected. For more details, please refer to the above description.

[0347] In one embodiment, flow information may include at least one IP and port among the source and destination, and at least one of the protocol, application, and metadata for the network traffic. Accordingly, the first CTI device (1000) can identify which source and destination use which protocol and application based on which IP and port through the flow information.

[0348] In one embodiment, the first CTI device (1000) can provide vulnerability information corresponding to flow information. That is, it can identify elements with a high probability of external attack and pre-examine and manage vulnerabilities regarding said elements.

[0349] In other words, existing ASMs manually scan ports and IPs by inputting all network IP ranges to check which ports are open and what vulnerabilities exist. However, this approach consumes significant network resources, and incorrect IP range input can lead to scanning third-party networks. Furthermore, scanning may fail if actual ports or services are not open at the time of the scan. Additionally, the scan may not be performed properly due to firewalls or network configurations.

[0350] Accordingly, according to the present disclosure, without the need to perform a network scan, a first CTI device (1000) can receive network traffic and identify flows, protocols, and applications from packet data, and can accumulate and manage what ports each IP has, protocols, and services.

[0351] In one embodiment, when the first handshake between the source and destination—that is, when this session is first established—checks a specific number of the first few bytes, the version of the server (the server providing content services) on the corresponding port serving the application via the relevant protocol can be identified. In one embodiment, the bytes may include bytes of packets transmitted and received during the handshake for communication between the source and destination. In this case, during the handshake process for creating a session, a specific pattern of the bytes can be determined by analyzing the first specific number (e.g., N bytes) of bytes (or packets), and flow information, such as a server or service corresponding to the specific pattern, can be identified. In one embodiment, at least one of server identification information and a server version can be identified by analyzing a specific number of the first few bytes included in a service-specific banner message or Hello message transmitted and received between the source and destination. In this case, information about the counterpart server can be obtained during the handshake process. When such metadata is extracted, it can be compared with a vulnerability database to identify and provide information on which vulnerabilities exist in the actual service servers.

[0352] In this way, according to the present disclosure, even without performing a network scan, the effect of a network scan can be achieved by verifying transmitted and received packet data and using the packet data.

[0353] In one embodiment, after identifying a system or service with a high probability of attack, DPI is performed to deeply analyze traffic to said system or service, thereby detecting and responding to security threats. Additionally, security vulnerabilities in high-probability attack areas identified through ASM can be verified and remedied through actual traffic analysis via DPI. In one embodiment, threats to points identified by ASM can be analyzed and detected by DPI, and explanations regarding points detected after analysis by DPI can be supplemented by ASM.

[0354]

[0355] FIG. 9 discloses an example of providing vulnerability details and measures identified by ASM technology according to an embodiment.

[0356] The natural language model (2320) of this drawing may receive a JSON file containing vulnerability information configured in a text format. Here, the JSON file is a format capable of containing structured data and can be used to represent various information. In one embodiment, the JSON file may include a field containing text, a question, and an object describing additional context.

[0357] In one embodiment, the JSON file may include vulnerability information including a standardized CVE-ID. For example, the JSON file may include at least one of session information, source open ports, service information, operating system, vulnerability ID information (e.g., CVE-ID), destination open ports, service information, operating system, vulnerability ID information, session event information, collected file analysis information, malware match information (e.g., Yara rules), antivirus detection information, and artificial intelligence-based external threat detection information. In the present disclosure, the format of the file input to the natural language model (2320) may be configured in various ways and is not limited.

[0358] In one embodiment, a natural language model (2320) that has a JSON file containing vulnerability information input can output vulnerability description information described in text form. For example, if CVE-2023-44487 vulnerability information for 10.10.4.111 TCP 443 included in flow information is input into the natural language model (2320), a description of the CVE-2023-44487 vulnerability and measures to address the vulnerability can be output in natural language.

[0359] In one embodiment, a predefined question-and-answer command (e.g., input sentence) may be used to train an AI-based natural language model (2320) that outputs vulnerability description information. In one embodiment, the question-and-answer command may include a description and a question regarding a specific vulnerability issue that enables the natural language model (2320) to understand and explain a specific type of vulnerability. In one embodiment, the question-and-answer command may include at least one of a sentence representing an actual vulnerability and a format of an actual vulnerability report. By training the natural language model (2320) using these question-and-answer commands, the natural language model (2320) may recognize various types of vulnerabilities and provide vulnerability description information.

[0360]

[0361] FIG. 10 discloses an embodiment of a method for processing cyber threat information according to an embodiment.

[0362] Packet data included in network traffic between a source and a destination is collected (S310). In one embodiment, packet data can be collected at high speed by polling packets received by a network interface card without using a kernel using a high-speed processing library. For a detailed explanation of this, refer to the details described in FIG. 7.

[0363] Packet data is analyzed to generate vulnerability information corresponding to flow information for said packet data (S320). In one embodiment, the vulnerability information may include a standardized CVE (Common Vulnerabilities and Exposure)-ID (identifier) ​​corresponding to the flow information. For a detailed explanation of this, refer to the details described in FIGS. 7 to 9.

[0364] Vulnerability information is input into a natural language model to provide vulnerability description information (S330). In one embodiment, the vulnerability description information may include at least one of vulnerability content information in natural language form and vulnerability remediation information. In one embodiment, prior to step S330, the natural language model may be trained based on at least one of vulnerability information and a Question-Answer Instruction generated using said vulnerability information. For a detailed explanation of this, refer to the details described in FIGS. 7 to 9.

[0365]

[0366] FIG. 11 is a drawing disclosing another embodiment of a cyber threat information processing device according to an embodiment.

[0367] The cyber threat information processing device (10000) can receive feed data provided by the cyber threat intelligence system in real time.

[0368] To this end, the cyber threat information processing device (10000) may include an Application Programming Interface (API) (1100) and a framework (2200).

[0369] At this time, the cyber threat information processing device (10000) does not require an essential connection with the API (1100) to receive feed data provided by the cyber threat intelligence system in real time. That is, the cyber threat information processing device (10000) can collect feed data by directly connecting to a storage device (e.g., a database (2700)) to receive feed data in real time.

[0370] Additionally, the framework (2200) may include an analysis and prediction module and an artificial intelligence (AI) model (2240). The cyber threat information processing device (10000) may utilize already classified malicious code contained in the database (2700), or pattern codes of stored malicious code, etc. For example, the database (2700) may store sample data, functions, or result information, and may store ASM files and JSON files converted from input files.

[0371] In one embodiment, the database (2700) may store samples of feed data in the form of pre-generated metadata. That is, the cyber threat information processing device (10000) may store feed data provided by the cyber threat intelligence system in the database (2700) as soon as it receives it in real time.

[0372] In one embodiment, the cyber threat information processing device (10000) can collect feed data provided by the cyber threat intelligence system and analyze it through an analysis module to build a dataset.

[0373] More specifically, the analysis module can build a training dataset for an AI model (2240) using feed data collected according to a preset period. For example, the cyber threat information processing device (10000) can collect first feed data from March 1, 2024 to March 31, 2024, and build a training dataset for an AI model using the first feed data on April 1, 2024. At this time, the target AI model corresponds not only to an AI model included within the cyber threat information processing device (10000) but also to an AI model for an external system.

[0374] In one embodiment, the constructed dataset is transferred to a learning module (2241) within an AI model (2240), and the AI ​​model (2240) can learn using the dataset.

[0375] At this time, in order for the AI ​​model (2240) to learn normally, it is necessary to maintain a balance of the number of data per label. For example, training data in which the ratio of normal file data to malicious file data is balanced at 5:5 may be the most ideal.

[0376] Furthermore, training data should ideally include various data types, and redundant data must be eliminated. In this process, the distribution of data types must be even. For example, if there are features A, B, and C, and most of the data consists of features A and B, it implies that the distribution of data types is uneven. This means the training data is skewed toward specific features, which can act as noise during model learning. Therefore, the data must be structured so that each feature is evenly distributed.

[0377] In addition, duplicate data may include not only files with identical hash values ​​but also files with different hash values ​​but identical content composition.

[0378] More specifically, in the case of duplicate data, if duplicate data is distinguished solely by hash values, there is a limitation in that files with identical content structures may be included even if their hash values ​​are not the same.

[0379] Therefore, a sampling method is required to remove duplicate data and ensure an even distribution of the data.

[0380]

[0381]

[0382] *332 FIG. 12 is a drawing disclosing an embodiment of a cyber threat information processing method according to an embodiment.

[0383] This drawing describes a sampling method for the cyber threat information processing device described above to remove duplicate data and evenly distribute the data.

[0384] In one embodiment, the cyber threat information processing method may primarily use a method to remove sample data having the same hash value in order to remove duplicate data for the purpose of constructing training data. However, in this case, while completely identical samples can be determined as duplicate data and removed, there is a disadvantage in that identical sample data that has been slightly modified to evade detection cannot be removed.

[0385] In addition, in one embodiment, the cyber threat information processing method may use a fuzzy hash-based method for removing sample data to construct training data excluding duplicate data. This corresponds to a method for removing duplicate data by calculating a fuzzy hash based on byte values ​​in binary samples and removing sample data with high similarity. However, in this case, I / O operations for downloading sample data are required, a high amount of computation is required for fuzzy hash calculation, and there is a disadvantage in that it cannot fully reflect the characteristics of the attack campaign information provided by the cyber intelligence system.

[0386] A cyber threat information processing method that improves upon these drawbacks can remove duplicate data from collected feed data through an embedding model and quantization and encoding processes. More specifically, the feed data provided by the cyber threat information processing method from a cyber intelligence system may include at least one of information regarding threat types (e.g., backdoors, ransomware, etc.), information regarding attack techniques used, information regarding attack groups, and information regarding attack industries.

[0387] In one embodiment, the cyber threat information processing method can convert first feed data received from a cyber intelligence system into a first vector representation. More specifically, one feed data (metadata) has information stored in JSON data format.

[0388] In one embodiment, the cyber threat information processing method can convert one JSON data into one vector representation by utilizing an embedding model.

[0389] More specifically, the cyber threat information processing method can convert feed data into a vector representation using an embedding model generated by learning feed data provided by a cyber threat intelligence system. In one embodiment, the embedding model used is characterized by being generated by learning cyber threat information analysis data for a file. In this case, the cyber threat information analysis data for a file learned by the embedding model corresponds to data obtained by analyzing feed data collected by cyber threat intelligence. Additionally, the embedding model used may include a transformer-based model.

[0390] For example, a cyber threat information processing method can utilize an embedding model to convert the first to n JSON metadata into the first to n vectors. Here, the converted vector corresponds to a vector composed of n-dimensional real numbers.

[0391] In one embodiment, a cyber threat information processing method may apply a quantization technique to a transformed vector to quantize it and encode it into a string to extract a signature string. At this time, since the encoded string is generated based on the embedding vector, similar JSON metadata will have similar vectors. That is, vectors corresponding to similar JSON metadata are converted into the same signature string after undergoing the quantization and string encoding processes. For example, a hash value can be extracted from the signature string through vector quantization and string encoding.

[0392] By utilizing the transformed signature string, the cyber threat information processing method can determine that all sample groups having the same signature string are similar and consider them as duplicate data. Therefore, the cyber threat information processing method can select one representative sample data and remove the remaining sample data.

[0393] In one embodiment, the cyber threat information processing method can remove sample data determined to be duplicate data and organize all sample data remaining into a dataset.

[0394]

[0395] FIG. 13 is a diagram disclosing a quantization technique used by a cyber threat information processing method according to an embodiment.

[0396] This drawing describes a vector quantization technique used in a sampling method performed by a cyber threat information processing device to remove redundant data.

[0397] In one embodiment, the cyber threat information processing device may perform vector quantization on the transformed vector. In one embodiment, the cyber threat information processing device may perform vector quantization that transforms a vector containing continuous feature values ​​(e.g., real numbers) into a vector containing discrete feature values ​​(e.g., integers). Here, the vector containing discrete feature values ​​on which vector quantization has been performed may represent a representative vector.

[0398] In addition, the cyber threat information processing device can perform encoding that converts the feature values ​​of a vector containing discrete feature values ​​into hash values.

[0399] For example, similar function vectors (e.g., FV1 to FV4) included in a similar set of vectors are vector quantized to be converted into the same representative vector (e.g., RV1), and the same hash value 7ABBB9 can be produced. On the other hand, different function vectors (e.g., FV5) are vector quantized to be converted into a representative vector (e.g., RV2), and the hash value 0AAFF9 can be produced.

[0400] In other words, it can be confirmed that the representative vector of a different vector differs from the representative vector of a similar vector, and consequently, the hash values ​​are also different. That is to say, a cyber threat information processing device can convert similar vectors into the same representative vector through vector quantization. In this case, the same representative vector can have the same hash value.

[0401] In other words, a cyber threat information processing device can quantize a continuous vector into a finite set of representative vectors through vector quantization. That is, the representative vector can refer to the quantized vectors of the functions (i.e., the function to be analyzed and the function to be compared).

[0402] Quantized vectors, also known as codebook vectors or centroids, serve as representative vectors among vectors, enabling a compressed representation of data while expressing a certain amount of information.

[0403] Therefore, the model training / update process is not required, and a search query can be defined to retrieve only functions with the same representative vector. Finally, a hash value can be defined.

[0404] The existing classification model, which tags based on labels and finds candidate functions with the same tag, had high computational complexity due to the large number of candidate groups to compare; however, the method according to the present invention has low computational complexity because it only needs to compare candidate groups of functions that have similar vectors, thereby enabling efficient similarity search.

[0405]

[0406] FIG. 14 is a diagram disclosing a quantization technique used by a cyber threat information processing method according to an embodiment.

[0407] This drawing describes a vector quantization technique used in a sampling method performed by a cyber threat information processing device to remove redundant data.

[0408] In one embodiment, the cyber threat information processing device can perform vector quantization on the vector. Specifically, the cyber threat information processing device can obtain a vector containing continuous feature values ​​(S89110).

[0409] A cyber threat information processing device can perform dimensionality reduction by reducing the dimensionality of a multidimensional dataset composed of continuous feature values ​​included in a vector and reconstructing it into a vector having reduced dimensions (S89120). For example, a 401-dimensional vector can be reduced to 16 dimensions through PCA.

[0410] Additionally, the cyber threat information processing device can perform scaling to extend each feature value of the dimensionally reduced vector to a certain range (S89130). For example, each feature value can be extended to the range [0,15] (16 intervals).

[0411] Additionally, the cyber threat information processing device can perform rounding for each feature value of the vector that has been scaled (S89140). For example, the feature value 1.43697265 can be rounded and converted to 1.

[0412] Additionally, the cyber threat information processing device can convert the feature value of a vector that has been rounded from a continuous type to a discrete type (S89150). For example, the feature value of a real number type can be converted into a feature value of an integer type of 1 byte.

[0413] Additionally, the cyber threat information processing device can convert each feature value of the vector converted into an integer type into a hash value (S89160). In one embodiment, the hash value may be represented as a string. For example, the hash value may include b123f21e12f12123.

[0414] In the experimental data set, 274,572 hash values ​​can be derived from a total of 885,131 functions. Accordingly, a reduction rate of 68.97% can be confirmed.

[0415] In one embodiment, similarity between vectors can be calculated and statistical values ​​extracted in the cluster with the most identical hash values. For example, similarity may include the cosine distance between vectors.

[0416] In one embodiment, referring to , the minimum, maximum, and quartile values ​​are all the same, which may all represent the same function. Accordingly, for example, when a search is performed based on hash values ​​according to vector quantization according to the present invention, it can be confirmed that the number of functions to be compared is reduced from 6,567 to 17 (0.259%).

[0417] Stat.0%25%50%75%100%MeanStd.Value6.66e-166.66e-166.66e-166.66e-166.66e-166.66e-160.0

[0418] FIG. 15 is a drawing disclosing an embodiment of a cyber threat information processing method according to an embodiment.

[0419] In one embodiment, the cyber threat information processing method can collect feed data provided by the cyber threat intelligence system (S410).

[0420] In one embodiment, the cyber threat information processing method can analyze the collected feed data to construct a training dataset for an artificial intelligence model (S420). In one embodiment, the cyber threat information processing method can convert the feed data into a vector through an embedding model in order to construct a training dataset. At this time, the feed data includes first to n JSON metadata, wherein n is an integer. In one embodiment, when converting the feed data into a vector through an embedding model, the first to n JSON metadata is converted into first to n vectors.

[0421] In one embodiment, the cyber threat information processing method can convert the converted vector into a signature string through a vector quantization process. At this time, the vector quantization process may include an encoding process that converts the converted vector into a hash value containing discrete feature values ​​obtained by vector quantization.

[0422] In one embodiment, the cyber threat information processing method can remove duplicate data based on the transformed signature string. At this time, all sample groups having the same signature string are determined to be similar, representative sample data is selected from the determined similar sample groups, and the remaining data excluding the representative sample data from the similar sample groups can be removed from the training dataset.

[0423] A cyber threat information processing method can construct a dataset that does not contain duplicate data by vectorizing collected feed data through an embedding model and quantizing it. In existing methods, data is judged to be different if only the hash value is different, so practically, a large amount of duplicate data is inevitably included in the dataset. According to the present invention, there is an advantage in being able to construct a dataset containing high-quality sample data by removing duplicate data using a more rigorous method.

[0424]

[0425] Below, examples are disclosed that allow for easy acquisition of malware analysis data and the acquisition of specifically required or technically necessary datasets to respond to cyber threats as described above.

[0426] The following embodiments can generate and provide AI training datasets based on Advanced Persistent Threat (APT) intelligence data to respond to various malicious activities. Based on this, customers or companies can strengthen AI-based cyber threat response and provide high-quality data.

[0427] A detailed example is disclosed of generating and refining such datasets to provide reliable datasets to customers.

[0428]

[0429] FIG. 16 discloses another example of a cyber threat information processing device that generates artificial intelligence training data capable of responding to cyber threats.

[0430] The disclosed embodiment illustrates an example of a cyber threat information processing device that can be operated by a database (2700) and a computing server (2800). The computing server (2800) may be a virtualized cloud server or an on-premises server and may include one or more nodes or one or more processors.

[0431] The disclosed example is a platform based on an Application Programming Interface (API) (2100) that can transmit or receive requests related to cyber threat information from at least one client (101, 103, 105).

[0432] Clients (101, 103, 105) can obtain analyzed or predicted results from network intelligence (CTI) devices (102, 104, 106) capable of analyzing cyber threat information transmitted from a network.

[0433] In the following, for components identical to the example above, the description of the embodiment disclosed above may be applied as is.

[0434] Network intelligence (CTI) devices (102, 104, 106) may follow the example of the first CTI device (1000) disclosed above.

[0435] The intelligence platform (CTI) (2201) can process cyber threat information in a platform format and provide the results as another embodiment of the framework (2200) included in the second CTI device (2000) disclosed above. A detailed description of this has been disclosed above.

[0436] For example, an intelligence platform (CTI) (2201) can receive various cyber threat information, files, or queries from a client (101, 103, 105) or a network intelligence (CTI) device (102, 104, 106) through an application programming interface (API) (2100). The intelligence platform (CTI) (2201) can process the received cyber threat information, files, or queries and provide the results to the client (101, 103, 105) or the network intelligence (CTI) device (102, 104, 106).

[0437]

[0438] In this example, the intelligence platform (CTI) (2201) may include an artificial intelligence (AI)-based information processing module (2202) and several modules capable of analyzing cyber threat information in various ways. Although the intelligence platform (CTI) (2201) has several analysis modules and each module can analyze or predict various cyber threat information, it is referred to here as the Nth module (2220). Since the description of the Nth module (2220) has been disclosed above, a description thereof is omitted.

[0439] The artificial intelligence (AI)-based information processing module (2202) may be the AI ​​processing unit (2300) disclosed above. The natural language model (2320) included in the AI ​​processing unit (2300) disclosed above may be a separate AI agent (2600) separated from the intelligence platform (CTI) (2201) in this drawing. That is, the natural language model (2320) disclosed above may be the AI ​​agent (2600) in this drawing.

[0440] The prompt hub (2350) includes a dataset of various queries or optimized questions and answers optimized for processing cyber threat information, and can use this to provide various explanations of the results of processing cyber threat information to the AI ​​agent (2600) to the clients (101, 103, 105).

[0441] For example, the prompt hub (2350) can generate a set of natural language questions and answers based on the results of cyber threat information analyzed, generated, or processed by the post-processing framework (2500) or intelligence platform (CTI) (2201) described below.

[0442] The prompt hub (2350) may include the functions of the query module (2230) disclosed above. In this drawing, the prompt hub (2350) within the second CTI device (2000) is depicted as a separate module from the intelligence platform (CTI) (2201), but the intelligence platform (CTI) (2201) may include the prompt hub (2350) as the query module (2230) disclosed above.

[0443] The post-processing framework (2500) can post-process cyber threat information analyzed or generated by the prompt hub (2350), AI agent (2600), and intelligence platform (CTI) (2201) and provide it to clients (101, 103, 105).

[0444] In this example, the post-processing framework (2500) may include several modules (2510, 2520) according to the post-processing method and processing function of cyber threat information, and are represented in this drawing as the first module (2510) to the Kth module (2520).

[0445] For example, the post-processing framework (2500) can generate various cyber threat information analyzed by the intelligence platform (CTI) (2201), such as visualization information on specific malicious behavior or advanced persistent threats (APT), and provide it through the intelligence platform (CTI) (2201).

[0446] Information related to the query of the post-processing framework (2500) is transmitted to the prompt hub (2350), and the prompt hub (2350) can generate a query for cyber threat information using the generated question-and-answer database and transmit it to the AI ​​agent (2600).

[0447] In this embodiment, the post-processing framework (2500) is described as a separate framework from the intelligence platform (CTI) (2201), but multiple modules within a single framework may be configured to perform the respective described modules.

[0448] That is, the post-processing framework (2500) may be composed of the intelligence platform (CTI) (2201) and the framework (2200) disclosed above as a single framework, or it may be divided into several separate frameworks depending on the function of the module.

[0449] In this example, for convenience, the intelligence platform (CTI) (2201) is represented as a platform for detecting malware or malicious behavior, and the post-processing framework (2500) is represented as a set of modules that generate a dataset by applying metadata to the results detected by the intelligence platform (CTI) (2201) or provide the detected results to a client after performing certain post-processing.

[0450] Accordingly, as in the example disclosed above, the post-processing framework (2500) may include a module that performs vector quantization used in a sampling method for removing duplicate data and making the data distribution uniform among the data processed by the intelligence platform (CTI) (2201). A detailed embodiment thereof has been disclosed above.

[0451] The AI ​​agent (2600) can provide a natural language description along with relevant information to the client (101, 103, 105) directly or through the intelligence platform (CTI) (2201) in response to a cyber threat information query received from the prompt hub (2350).

[0452] Alternatively, the AI ​​agent (2600) may provide this information or explanation back to the post-processing framework (2500) so that the post-processing framework (2500) can regenerate the relevant information.

[0453]

[0454] Meanwhile, the intelligence platform (CTI) (2201) can perform malware or malicious behavior analysis based on at least one of various files, queries, hash values ​​of files, or metadata of cyber threat information received from clients (101, 103, 105) or network intelligence (CTI) devices (102, 104, 106) or collected by crawling itself.

[0455] The intelligence platform (CTI) (2201) can generate various metadata related to the analyzed malware or detected malicious activity.

[0456] The metadata of the analyzed cyber threat information may include the date the analysis was completed or updated, the date the malicious activity was collected, the date the malicious activity was first detected, the date the malicious activity was last detected, and the hash values ​​of various related hash functions.

[0457] In addition, the metadata of the analyzed cyber threat information may include the size of the file associated with the malicious activity, the file type, and tag information associated with the malicious activity.

[0458] The metadata of the analyzed cyber threat information may include the detection name of the malicious activity, the file name, the type of threat, an attack identifier related to the attack technique, a name assigned to the attack activity, a tactic related to the attack activity, and site information that can explain it.

[0459] And, if cyber threat information is analyzed from a specific file, cyber threat information of files similar to the specific file and campaign-related Indicators of Compromise, that is, indicators of compromise that may be common to the cyber threat incident, may be included in the cyber threat information.

[0460] Regarding the cyber threat information analyzed by the intelligence platform (CTI) (2201) and the metadata generated, the post-processing framework (2500) may store the generated dataset in the database (2700) together with the metadata or by adding labeling to the metadata. Depending on the collected cyber threat information, normal data may be stored, or known malicious behavior data or new malicious data may be stored.

[0461] An example has been disclosed in which the framework (2200) or post-processing framework (2500) disclosed above receives data analyzed by the intelligence platform (CTI) (2201) as feeding data to build an artificial intelligence dataset.

[0462] The intelligence platform (CTI) (2201) transmits the client's request or the detection of the network intelligence (CTI) device (102, 104, 106) or its own detection results to the post-processing framework (2500).

[0463] The first module (2510) of the post-processing framework (2500) can label the transmitted detection result dataset, store it in the database (2700), and provide it when requested by a client.

[0464] For example, when the first module (2510) of the post-processing framework (2500) receives a condition including metadata of cyber threat information from the first client (101), it can extract a dataset matching the condition from the database and provide it to the first client (101).

[0465] As another example, the first module (2510) of the post-processing framework (2500) can automatically provide the dataset to an intelligence platform (CTI) (2201) or other websites according to certain conditions or labeling, regardless of the user.

[0466] A processor in the computing server (2800) can store a dataset labeled according to metadata related to cyber threat information of the input data in the database (2700).

[0467] A processor in the computing server (2800) can generate a dataset corresponding to metadata or query information requested by a client from a dataset stored in the database (2700).

[0468] A processor in the computing server (2800) can provide the stored or generated dataset to the client according to the purchase method selected by the client.

[0469]

[0470] FIG. 17 is a drawing disclosing an example of providing a malicious code dataset according to an embodiment.

[0471] The post-processing framework (2500) of the second CTI device (2000) can automatically generate and store a dataset related to cyber threat information for a specific period, for example, every month, and provide it to a client at a specific time. For example, the post-processing framework (2500) can generate and store a dataset on the 1st of every month, according to a set labeling, of new malicious or normal data collected during the previous month.

[0472] The post-processing framework (2500) can provide clients with a dataset that is automatically labeled and stored according to a specific period on an interface or website. For example, the post-processing framework (2500) can automatically store a dataset labeled with relevant metadata for data detected by an intelligence platform (CTI) on a monthly basis.

[0473] For example, the above metadata may include at least one of the following: the date the analysis of the detected data was completed or updated, the date the malicious activity of the input and detected data was collected, the date the malicious activity of the input and detected data was first detected, the date the malicious activity of the input data was last detected, and the hash value of a hash function associated with the input data.

[0474] Additionally, the metadata may include at least one of the following: the size of a file related to the malicious activity of the input data, the type of the file, tag information related to the malicious activity, the detection name of the malicious activity, the name of the file, the type of threat of the malicious activity, an attack identifier related to the attack technique of the malicious activity, a name assigned to the attack activity of the malicious activity, a tactic related to the attack activity of the malicious activity, and site information describing the attack technique.

[0475] It may include at least one of the date the analysis of the input data was completed or updated, the date the malicious activity of the input data was collected, the date the malicious activity of the input data was first detected, the date the malicious activity of the input data was last detected, and the hash value of the hash function associated with the input data.

[0476] Datasets based on labeling may include normal datasets or malicious datasets.

[0477] For example, the post-processing framework (2500) can automatically generate metadata of malicious files and normal files that were first collected in the previous month. And the post-processing framework (2500) can provide a file, such as a specific filename (e.g., YYYYMM.tar.gz), to a client through a website, etc.

[0478] The post-processing framework (2500) can provide list information of automatically generated datasets when the client (101) connects. The post-processing framework (2500) may allow the client (101) to download the relevant dataset when the client (101) purchases the selected dataset, or send an email containing a link to download the selected dataset to the client's (101) registered email address. If the dataset is large, it may be divided into multiple compressed files and provided.

[0479] Meanwhile, the client (101) can obtain a dataset labeled and stored in the database (2700) by the post-processing framework (2500) according to desired conditions.

[0480]

[0481] In this diagram, the dataset in the first column provides metadata related to the generated dataset. That is, for the dataset in the first column, metadata such as related labels (tag), collection period (Period: From / To), collection date or threat observation date (Date (or Seen)), hash value (Hash), related file information (file info), threat type, and attack tactics, techniques, and methods (Attack TTP) can be provided.

[0482] The client can view the metadata of the dataset built by the post-processing framework (2500) and purchase or subscribe to the relevant dataset to use it.

[0483] The dataset in the second column is an interface that allows the client to obtain the dataset by inputting metadata for the dataset required by the post-processing framework (2500).

[0484] The client can obtain related datasets, such as malicious datasets, by purchasing or subscribing to them by inputting metadata related to cyber threat information of the desired dataset. That is, the embodiment can generate a customized dataset corresponding to a combination of metadata selected or input by the client and provide it to the client.

[0485] The datasets in the third and fourth columns of this drawing disclose an example of providing datasets according to the conditions of specific client queries.

[0486] Clients can search for data types and construct queries. For example, keywords included in the query can contain core keywords of the metadata.

[0487] Metadata may include at least one of the date the analysis of the input data was completed or updated, the date the malicious activity of the input data was collected, the date the malicious activity of the input data was first detected, the date the malicious activity of the input data was last detected, and the hash value of a hash function associated with the input data.

[0488] As another example, metadata may include at least one of the following: the size of a file related to the malicious activity of the input data, the type of the file, tag information related to the malicious activity, the detection name of the detected activity, the name of the file, the type of threat of the malicious activity, an attack identifier related to the attack technique of the malicious activity, a name assigned to the attack activity of the malicious activity, a tactic related to the attack activity of the malicious activity, and site information describing the attack technique.

[0489] For example, the dataset in the fourth column indicates that it was generated through a query of keyword combinations of metadata such as "file_type : exe* OR file_type : pdf AND victim_country : KR".

[0490] In this way, when a client inputs a combination of metadata or a query of a certain format, the post-processing framework (2500) can provide a dataset corresponding to the input metadata combination or query as a search result using the labeling stored in the database. When the client selects a dataset provided according to the search result, the post-processing framework (2500) can provide a dataset corresponding to the metadata to the client.

[0491] In the example of this diagram, purchasing and subscription were illustrated regarding how a client obtains a dataset.

[0492] When a client purchases a dataset, the post-processing framework (2500) may allow the client to download the relevant dataset a set number of times.

[0493] When a client subscribes to a dataset, the post-processing framework (2500) can generate a dataset that matches the query requested by the client and provide it to the user via email or the like at a specific time.

[0494]

[0495] FIG. 18 discloses an example of processing cyber threat information that can provide a dataset.

[0496] The post-processing framework (2500) of the intelligence system stores a dataset labeled according to metadata related to cyber threat information of the input data in the database (2700) (S510).

[0497] Examples of generating a dataset are illustrated in FIGS. 11 to 17.

[0498] The post-processing framework (2500) can generate a dataset corresponding to metadata or query information requested by the client from a dataset stored in the database (2700) (S530).

[0499] If the dataset requested by the client does not match the stored dataset, the requested dataset can be generated from the stored dataset using the metadata or query information requested by the client. An example of this is disclosed in FIGS. 16 and 17. This step may be omitted if the dataset requested by the client is already stored in the database.

[0500] The post-processing framework (2500) provides the client with the stored or generated dataset according to the purchase method selected by the client (S550). A detailed example of this is disclosed in FIG. 17.

[0501] According to the embodiment, malware analysis data can be easily obtained to respond to cyber threats as described above, and specific or technically necessary datasets can be acquired.

[0502] According to the embodiment, an AI training dataset can be generated and provided based on Advanced Persistent Threat (APT) intelligence data to respond to various malicious activities. Based on this, customers or companies can strengthen AI-based cyber threat response and provide high-quality data.

[0503] According to the embodiment, by generating and refining such a dataset, a reliable dataset can be provided to customers.

[0504]

[0505] FIG. 19 is a conceptual diagram for conceptually explaining an embodiment disclosed.

[0506] If there is a false positive result in the detection results of the cyber threat information processing system, an inquiry email regarding this may be received from the user.

[0507] Generally, the current processing of cyber threat information processing systems involves an administrator analyzing the inquiry email, and if the request included in the email concerns the handling of a system false positive, taking measures to restore the related detection results to normal (indicated as As Is).

[0508] However, as with the problem described above, these measures are not efficient and leave room for errors due to manual work by managers.

[0509] Accordingly, the following discloses examples of how to process user inquiry emails by analyzing them through a natural language-based AI agent and automatically perform measures against false detection (indicated as To Be).

[0510]

[0511] FIG. 20 is a diagram illustrating the procedure for responding to over-detection in cyber threat information processing using a natural language model according to an embodiment disclosed.

[0512] The intelligence platform (CTI) (2201) can receive executable or non-executable files from a client and analyze and detect whether they contain cyber threat information. A detailed description of this was disclosed in the previous embodiment.

[0513] The intelligence platform (CTI) (2201) may detect cyber threat information based on files or queries received from the client, but as described above, it may also detect cyber threat information based on information received from various devices installed on the client-side device.

[0514] In the event of a false positive occurring during the detection process described above, the client may be unable to process specific emails, files, or information because they are filtered. In this case, the client may request confirmation regarding the false positive processing or request that the file or information be legitimate through the system's mail system.

[0515]

[0516] Mail storage (2212) can store mail containing a request for correction regarding the over-detection of the intelligence platform (CTI) (2201).

[0517] The mail processing unit (2610) of the natural language model agent (LLM agent) or AI agent (2600) according to the embodiment loads a mail containing a request for correction regarding a false positive processing stored in the mail storage (2212). The mail processing unit (2610) can filter and parse the loaded mails to parse the request details within the mail.

[0518] The query analysis unit (2630) of the AI ​​agent (2600) receives the request parsed by the mail processing unit (2610), combines the system prompt received from the prompt hub (2350) with the request to query the natural language model and perform natural language analysis on the request.

[0519] The query analysis unit (2630) can extract necessary data from the analysis results and verify the content of the request. The query analysis unit (2630) can also generate the content of the request into a file of a specific format, such as a JSON file.

[0520]

[0521] The query response unit (2650) can verify whether the data is over-detected based on the data analyzed by the query analysis unit (2630) using a natural language model. In this case, the query analysis unit (2630) can classify cases where it has incorrectly detected data and transmit them back to the intelligence platform (2201). Therefore, the intelligence platform (2201) can subsequently detect the incorrectly detected data as normal by reflecting the analysis results of the query analysis unit (2630).

[0522] And the query analysis unit (2630) can provide the result report to the user in various apps or tools.

[0523] Detailed examples of each component are disclosed below with reference to the drawings.

[0524]

[0525] FIG. 21 is a diagram illustrating the mail processing procedure of a natural language model agent (LLM agent) according to an embodiment.

[0526] The mail processing unit (2610) of the natural language model agent (LLM agent) (2600) includes a mail loading unit (2611), a mail filtering unit (2613), and a mail parsing unit (2615).

[0527] The mail loading unit (2611) loads mail stored in the mail storage (2212) of the intelligence platform.

[0528] The mail filtering unit (2613) can filter mail from a specific user among the stored mail. For example, the mail filtering unit (2613) can filter mail included in a whitelist based on mail sender information. The whitelist used by the mail filtering unit (2613) is adjustable, and can be used to filter a minimum number of mails to identify mail requests required for over-detection processing.

[0529] The mail parsing unit (2615) parses the mail filtering unit (2613) that has filtered the mail. For example, the mail parsing unit (2615) can parse the subject, body, etc. included in the mail and remove parts that are unnecessary for request processing.

[0530] The mail data refined as a result of parsing by the mail parsing unit (2615) can be used to generate a query for the AI ​​agent later.

[0531] In this way, the mail processing unit (2610) of the AI ​​agent (LLM agent) (2600) can selectively filter only the mail necessary for responding to the detection and extract only the relevant requests from the user.

[0532]

[0533] FIG. 22 is a diagram illustrating the query analysis procedure of a natural language model agent (LLM agent) included in an embodiment.

[0534] The query analysis unit (2630) of the natural language model agent (LLM agent) (2600) includes a system prompt (2631), a query parsing unit (2633), a model result generation unit (2635), and an analysis result generation unit (2637).

[0535] The query analysis unit (2630) generates a query in a natural language model regarding the content of the email processed by the email processing unit (2610), and inquires with the natural language model using the generated query to generate a natural language model result.

[0536] Specifically, the system prompt (2631) can generate an optimal query prompt for the natural language model by combining the system prompt (2631) that processes prompts generated by the prompt hub (2350) and the mail data provided by the mail processing unit (2610).

[0537] The system prompt (2631) can generate rules, guidelines, or context information related to a query that can respond to a super-detection. The prompts that form the basis of the query for responding to a super-detection generated by the system prompt (2631) and the requests for super-detection within the email data received by the query analysis unit (2633) can generate input data for responding to a super-detection.

[0538]

[0539] *489 That is, the query parsing unit (2633) combines mail data transmitted in a specific file format such as JSON and a prompt generated by the system prompt (2631) to generate input data suitable for processing by the natural language model.

[0540] The prompt of the system prompt (2631) provides a prompt guide so that the natural language model can understand the purpose of the query and accurately return the necessary information.

[0541]

[0542] The model result generation unit (2635) generates natural language processing results for the query prompt generated by the system prompt (2631) and the query analysis unit (2633). The model result generation unit (2635) may be a natural language model or agents having functions connected to a natural language model.

[0543] Accordingly, the model result generation unit (2635) can identify the key keywords and the core of the request included in the mail data and provide the natural language result. The natural language result may include inquiries regarding cyber threat information related to the key keywords or request included in the mail data.

[0544]

[0545] The analysis result generation unit (2637) can provide a result in which processing results for cyber threat information are added to the natural language result generated by the model result generation unit (2635). In this example, an example is disclosed in which the natural language processing result of mail data is returned in a JSON file format.

[0546] For example, the result data provided by the model result generation unit (2635) may include whether specific link information (URL) is included in the mail data and status information (no status, normal, or malicious) of cyber threat information related to the link information (URL).

[0547] If the email data contains link information (URL), it can indicate whether the link is legitimate or malicious. This can also be used to determine the priority of user requests.

[0548] In this way, the analysis result generation unit (2637) can generate an analysis result for the cyber threat information included in the email.

[0549]

[0550] FIG. 23 is a diagram illustrating the query response procedure of a natural language model agent (LLM agent) included in an embodiment.

[0551] The query response unit (2650) of the natural language model agent (LLM agent) (2600) includes a mail filtering unit (2651), a detection filtering unit (2652), a malicious threat filtering unit (2658), and a response reporting unit (2659).

[0552] The mail filtering unit (2651) of the query response unit (2650) can filter out necessary request emails by filtering emails related to the over-detection response requests analyzed by the query analysis unit (2630) at regular intervals. For example, the mail filtering unit (2651) can select emails at regular intervals by considering the natural language results analyzed by the query analysis unit (2630), link information included in the email, or cyber threat information detected in relation to the link information.

[0553] The detection filtering unit (2652) can check whether there is an over-detection (over-detection of cyber threat information), a false detection (incorrect detection of cyber threat information), or a failure to detect (detection of cyber threat information) regarding the link information included in the selected emails or the cyber threat information detected in relation to the link information. Based on the result of the check, the detection filtering unit (2652) can classify and output 1) emails that do not have files and links within the email, 2) emails that have files or links within the email but do not have malicious cyber threat information (Benign), and 3) emails that have files or links within the email and do not have malicious information in the files or links (Malicious).

[0554] The malicious threat filtering unit (2658) can provide the intelligence platform (2201) with emails containing normal files or links (URLs) that do not contain threat information, or emails containing cyber threat information, among the emails classified by the detection filtering unit (2652). Then, the intelligence platform (2201) can take measures to correct the incorrectly detected results in the future. That is, the intelligence platform (2201) reflects the incorrectly detected results so that normal detection can be performed in the future.

[0555] The response reporting unit (2659) can transmit the processing results for emails classified by the malicious threat filtering unit (2658) to an administrator or user. This may include the number of emails processed, the number of incorrectly detected information, and the analysis content included in the information.

[0556]

[0557] FIG. 24 is a diagram illustrating the result of automatically processing a false positive response inquiry of a cyber threat information processing system according to the disclosed example.

[0558] This diagram shows, according to the example disclosed, the number of request emails related to detection results, the number of emails corresponding to the natural language model, the number of undetected links (URLs) among the emails corresponding to the natural language model, the number of overdetected links (URLs) among the emails corresponding to the natural language model, and link (URL) information of the request emails processed by the natural language model.

[0559]

[0560] According to the example disclosed above, a client's request can be delivered via email to the detection results of an intelligence platform, which is a cyber threat information processing device.

[0561] Mail count indicates that there are 2 related mails processed by the mail processing unit (2610). That is, it indicates the number of mails transmitted to the mail parsing unit (2615) through the mail loading unit (2611) and mail filtering unit (2613) of the mail processing unit (2610).

[0562]

[0563] The number of emails corresponding to the natural language model (success gpt response generate count) represents the number of emails automatically corresponding to these request emails using the natural language model, indicating that the number of emails parsed by the query parsing unit (2633) is 2. That is, it means the number of emails delivered to the model result generation unit (2635) among the emails parsed by the system prompt (2631) and the query parsing unit (2633).

[0564]

[0565]

[0566] False negative (malware->normal) response among emails responded to by a natural language model indicates the number of links (URLs) that failed to detect cyber threat information as a result of verification, among the cases where the natural language model responded to the above request email.

[0567] In other words, this refers to a case where the intelligence platform detected an email containing malware, but the verification result identified it as a link that does not contain threat information.

[0568] For example, if an email contains one or more URLs, the natural language model can identify the number of URLs within the email and determine whether they are malicious or legitimate based on their contextual structure. Here, the number of missed links refers to the number of URLs that were detected as malware but are actually legitimate.

[0569] The number of False Positive (normal->malware) responses among emails responded to by a natural language model represents the number of links (URLs) included in emails where cyber threat information was detected among cases where the above request emails were responded to by a natural language model.

[0570] In other words, this case indicates that there are 3 links that the intelligence platform detected as normal data in the email but were found to be related to malware upon verification.

[0571] In the example, the number of emails is 2, but it can be indicated that there are 3 URLs inside the emails.

[0572]

[0573] Also, the link (URL) information of the request email processed by the natural language model indicates detailed information about the links included in the email requested above.

[0574]

[0575] FIG. 25 is a diagram showing an example of a system overdetection response according to an embodiment of a method for processing cyber threat information.

[0576] A processor of a server included in a cyber threat information processing system receives a request for processing threat information of the cyber threat information processing system (S610).

[0577] The processor of a server included in a cyber threat information processing system can load requests stored in storage devices, such as mail storage, at regular intervals.

[0578] The received request may be received via email or similar means related to a false positive or false negative of the cyber threat information processing system. An example of receiving a request for the processing of threat information is disclosed in detail in FIGS. 20 and FIGS. 21.

[0579] The server's processor obtains a file or link related to the received request and uses a prompt to generate a natural language processing result included in the received request (S630).

[0580] Examples of natural language processing and analysis of related files or links in relation to a request are illustrated in FIGS. 20 and FIGS. 22.

[0581] The server processor verifies the detection result of the file or link corresponding to the above request and reflects the result corresponding to the above request in the cyber threat information system (S650).

[0582] Examples of modifying the detection results of cyber threat information regarding a request, reflecting them in the cyber threat information system, and reporting the results to users and administrators are illustrated in FIGS. 20, 23, and 24.

[0583] According to the disclosed example, the occurrence of errors can be reduced while efficiently and quickly processing false positive results of a cyber threat information processing system.

[0584]

[0585] FIG. 26 illustrates a cyber threat information processing device using a natural language model according to the disclosed example.

[0586] Here, examples of cyber threat information processing devices may include the intelligence platform (2201), post-processing framework (2500), and AI agent (not shown) exemplified above.

[0587] The intelligence platform (2201) can generate various types of cyber threat information using various executable or non-executable files, data on the internet or files entered by a user, hash values ​​or queries.

[0588] The post-processing framework (2500) includes various data processing modules, and here discloses an example including a statistical insight module (3000).

[0589] The statistical insight module (3000) includes a statistical data collection unit (3100) and a threat insight generation unit (3200), and the threat insight generation unit (3200) may again have insight generation units (3210, 3220) that generate various insight information.

[0590] The statistical data collection unit (3100) collects or extracts various types of cyber threat information provided by the intelligence platform (2201), for example, can generate or extract the number of campaigns (APT) by attack group and the frequency of indicators of compromise (IoC) used in attacks. Here, an attack campaign refers to a set of consecutive cyber attacks.

[0591] The statistical data collection unit (3100) can generate or extract statistical data such as the frequency of specific IPs used in attacks, the frequency of specific domains, the frequency of specific links (URLs), and the frequency of hash values.

[0592] The threat insight generation unit (3200) can generate insight information related to cyber threat information.

[0593] The first insight generation unit (3210) can, for example, generate insights related to attack groups among the statistical data collected by the statistical data collection unit (3100).

[0594] The second insight generation unit (3210) can generate insights related to statistical data of infringement indicators among the statistical data collected by the statistical data collection unit (3100).

[0595] Here, examples of generating insights related to attack groups or indicators of compromise are disclosed, but insights can also be provided by processing statistical data from other cyber threat information.

[0596]

[0597] FIG. 27 illustrates a procedure in which a first insight generation unit among the disclosed cyber threat information processing devices generates insight information.

[0598] The first insight generation unit (3210) can generate data that can provide insights into the characteristics of an attack group or attack behaviors from statistical data related to an attack group.

[0599] The first insight generation unit (3210) may include an anomaly detection module (3211) and a first statistics module (3213). For example, the anomaly detection module (3211) can find anomaly signs in the statistics data regarding the attack group's campaign among the statistics data collected by the statistics data collection unit (3100).

[0600] Various data related to the attack group's campaign, such as the attack group name, inflow path information, target country information, target industry information, etc., may be included, and the anomaly detection module (3211) can set this as an anomaly if there is a change in any of the information related to the attack group's campaign or if there is an anomaly such as an amount in specific information.

[0601] The first statistical module (3213) generates statistical data that can provide insights into abnormal data related to an attack group's campaigns captured by the anomaly detection module (3211). For example, the first statistical module (3213) can generate statistical data regarding data on affected industries, data on affected countries, data on threat classification, data on attack techniques, etc., related to the attack group's campaigns.

[0602] Statistical data regarding these attack group campaigns can be generated based on the attack group and can be generated for various threat information included in the attack group campaign.

[0603] The prompt hub (2350) receives statistical data included in the campaigns of attack groups generated by the first statistics module (3213) and generates a prompt that can generate news for each attack group based on this.

[0604] For example, the prompt hub (2350) can generate rules, instructions, or context information related to queries for each campaign from statistical data included in the campaigns of attack groups. By combining the rules, instructions, or context information related to these queries with the content of the statistical data for each campaign, the AI ​​agent (2350), which is a natural language model, generates input data that is easy for the agent to process. A detailed example of this will be described later.

[0605] In this way, the prompt hub (2350) can generate a prompt that allows the AI ​​agent (2350) to generate news about attack groups or campaigns of attack groups.

[0606] The AI ​​agent (2350) can perform a natural language model using the prompts of attack groups or campaigns generated by the prompt hub (2350) and generate natural language news information that explains the relevant information in natural language.

[0607] Then, even if the user cannot grasp or interpret detailed information about the attack groups' campaigns or changes in data, they can obtain insights by attack group or campaign from the natural language information generated by the AI ​​agent (2350).

[0608]

[0609] FIG. 28 discloses an example in which the first insight generation unit exemplified above detects anomalies regarding an attack group's campaign.

[0610] The first insight generation unit can check for changes in each item in the statistical data of an attack group's campaign. If an item included in the statistical data exceeds a certain threshold or statistically falls outside a specific range regarding these changes, it can be set as an anomaly.

[0611] This diagram illustrates statistical data for the attack group Barium.

[0612] The intelligence platform (2201) can continuously generate information such as file types, IP addresses, related domains, and related URLs for the attack group's campaigns.

[0613] Here, examples are provided by visualizing the attack group Barium and its related attack groups, along with information such as various file types, IP addresses, associated domains, and associated URLs used in this attack group campaign.

[0614] The anomaly detection module (3211) of the first insight generation unit can detect anomalies by receiving campaign statistics data for each attack group as exemplified. Looking at the associated campaign statistics (last 90 days) data shown in the lower part of this diagram, it usually occurs 2 to 3 times or 5 to 6 times, but on October 10, an anomaly can be seen that the number of occurrences increased sharply to 24 times.

[0615] In this way, based on date data, data up to the previous day can be aggregated or statistics can be used to detect anomalies using a threshold value, and an alarm can be provided.

[0616] Then, the first statistical module of the first insight generation unit can generate statistical data by category, such as affected industries, affected countries, threat classifications, and attack techniques related to the campaign, based on such abnormal sign alarms.

[0617] Using the statistical data generated in this way, the AI ​​agent can generate news by attack group.

[0618] For example, an AI agent can generate headline news with the following sentence.

[0619]

[0620] Recently, there has been a surge in F threat attacks targeting D industry in country C by attack group A using B attack techniques.

[0621]

[0622] In this way, by using the statistical data on campaigns by attack group from the first insight generation unit, natural language news about items related to that campaign can be generated.

[0623]

[0624] FIG. 29 discloses an example of prompt generation that can generate information using the natural language model exemplified above.

[0625] In this way, based on statistical data regarding campaigns by attack group from the first insight generation unit, PromptHub generates prompts for queries in a natural language model.

[0626] This diagram is an example of a prompt that generates natural language news related to an attack group campaign.

[0627] Prompts associated with attack group campaigns include a request part, a headline part, and a related data part extracted from statistical data.

[0628] For example, the request part of the prompt includes a description of the provided data and a specific request for news generation. Here, the provided data is campaign insight statistics collected in response to anomalies, and an example is provided of a specific request to generate relevant news using this data.

[0629] The headline part of the prompt provides the format of a news headline. The headline part may include items of statistical data related to an attack group campaign. In this example, the items of statistical data include the attack group, attack method, target country, target industry, and threat type.

[0630] The data part of the prompt can be configured to include values ​​for items of statistical data calculated by the actual first statistical module. In this example, data for the attack group Barium, the attack technique T1224, the target countries of the United States and Japan, the target industries of education and healthcare, and the threat type Ransomware were exemplified.

[0631]

[0632] FIG. 30 illustrates a procedure for generating insight information of the second insight generation unit among the disclosed cyber threat information processing device.

[0633] The second insight generation unit (3210) may include an Indicator of Compromise (IoC) filter module (3221) and a second statistics module (3223).

[0634] The Indicator of Infringement (IoC) filter module (3221) generates statistical data by filtering data related to infringement indicators among the statistical data collected by the statistical data collection unit (3100).

[0635] For example, the Indicator of Compromise (IoC) filter module (3221) can collect information on each indicator of compromise from statistical data.

[0636] The Indicator of Compromise (IoC) filter module (3221) can generate a list of the top N IPs, a list of the top N domains, a list of the top N URLs, a list of the top N hash data, etc.

[0637] The second statistical module (3223) generates statistical data for each indicator of intrusion so that it can have insights into each indicator of intrusion generated by the indicator of intrusion (IoC) filter module (3221). The second statistical module (3223) can generate statistical data for the top indicators of intrusion, such as, for example, related IoC-related affected industries, affected countries, threat classifications, and attack technique count values.

[0638] In other words, natural language news associated with each indicator of compromise can be generated based on statistical data such as affected industries, affected countries, threat classifications, or attack techniques.

[0639] For example, the format of news provided based on statistical data related to infringement indicators is as follows.

[0640]

[0641] Recently, threat attacks targeting industries F, G, and H in countries C, D, and E using B1 and B2 attack techniques from the 10.10.10.10 IP have been surging.

[0642]

[0643] To generate news that can provide insights based on such infringement indicators, the prompt hub (2350) generates prompts based on statistical data for each infringement indicator generated by the second statistical module (3223). For example, the prompt hub (2350) generates rules, guidelines, or context information for queries related to statistical data for each infringement indicator. Detailed examples thereof will be described later.

[0644] Then, the AI ​​agent (2350) can perform a natural language model using statistical data by infringement indicator generated by the prompt hub (2350) and generate natural language news information that explains related information in natural language.

[0645]

[0646] FIG. 31 discloses another example of prompt generation that can generate information using the natural language model exemplified above.

[0647] This diagram illustrates a prompt for generating natural language news related to indicators of compromise. Similar to the prompt for generating news related to attack group campaigns exemplified above, the prompt related to indicators of compromise includes a request part, a headline part, and a related data part extracted from statistical data.

[0648] For example, the request part of the prompt includes a specific request to generate news using the provided data, which is statistical data on infringement indicators. This example illustrates a specific request to generate related news using the provided data, which is insight statistical data on collected infringement indicators.

[0649] The headline part of the prompt provides the format of a news headline. The headline part may include items of statistical data related to indicators of compromise. In this example, the items of statistical data include attack method, target country, target industry, and threat type.

[0650] The data part of the prompt may include values ​​for items of statistical data produced by the second statistical module. In this example, the data part is exemplified as data for Indicator of Compromise (IoC) 10.10.10.10, attack technique T1224 or T1222, target countries United States and Japan, target industries education and healthcare, and threat type ransomware.

[0651]

[0652] Figure 32 illustrates a headline news generated using the natural language model exemplified above.

[0653] According to the disclosed example, news that can provide insights to users can be generated in natural language based on statistical data related to cyber threat information.

[0654] The example disclosed above can generate natural language news from statistical data of various threat information detected by an intelligence platform.

[0655] As examples of statistical data, statistical data obtained from attack group campaigns and indicators of compromise were exemplified, but similar news can be generated using other statistical data.

[0656] The prompt generated based on statistical data includes a request part for news generation, a headline part for delivering headline news, and a data part derived from actual statistical data.

[0657] As an example of news produced in this manner, the news in this diagram illustrates news about an attack group attacking an industry within a specific country (a), specific attack techniques used by the attack group (b), relevant URLs used in the attack techniques (c), content related to the attack damage (d), and information about the attack group (e).

[0658] And, news about headlines (f) related to (a) to (e) above can be provided.

[0659]

[0660] FIG. 33 discloses an example of processing cyber threat information in which news can be automatically provided using statistical data insights.

[0661] A cyber threat information processing device including a storage device and a processor collects data on cyber threat information detected (S710). Examples of data that can provide insights to the user from the perspective of the collected data include attack group campaigns or threat compromise indicators, but the same examples can be applied to other data.

[0662] The method by which a cyber threat device detects and collects cyber threat information is exemplified in FIGS. 1 to 15.

[0663]

[0664] Statistical data can be calculated from the collected data according to the type of the data (S730). Statistical data may be calculated differently depending on the data type. For example, in the case of an attack group campaign, abnormal signs exceeding a threshold can be determined, or in the case of an infringement indicator, statistics based on the indicator can be calculated. Examples of calculating statistical data from the collected data according to the type of the data are illustrated in FIGS. 26, 29 to 30.

[0665] In the case of a campaign based on an attack group, statistical data on various attack methods included in the campaign's attack can be calculated, and in the case of threat compromise indicators, statistical data on the elements included in the compromise indicators can be calculated respectively.

[0666]

[0667] Based on the generated statistical data, a prompt is generated, and a natural language model is used to generate and provide news based on the generated prompt (S750).

[0668] Examples of prompt requests for news generation, headline formats of news information considering statistical data of cyber threat information, and collected statistical data were provided.

[0669] By utilizing such cyber threat-generating news requests, the headlines of those news, and collected statistical data, users can obtain real-time, user-friendly cyber attack news.

[0670] The steps disclosed above may also be performed by a program processed by a processor of a cyber threat information processing device.

[0671] According to the disclosed example, users can gain intuitive insights into cyber threat information processed data and easily obtain natural language-based insights through interpretation information inherent in a vast amount of cyber threat information.

[0672]

[0673] FIG. 34 discloses embodiments of a cyber threat information processing device according to an embodiment.

[0674] A cyber threat information processing device according to the present disclosure may include a post-processing framework (2500).

[0675] In one embodiment, the post-processing framework (2500) may include a data collection unit (3010), a data pre-processing unit (3020), a threat analysis unit (3030), and a threat information provision unit (3040).

[0676] The data collection unit (3010) can obtain cyber threat-related information from open source (3000). For example, the cyber threat-related information may include analysis reports related to cyber threats. In one embodiment, the data collection unit (3010) can crawl and collect cyber threat-related information from open source (3000), such as web pages, reports, news articles, and social media.

[0677] The data preprocessing unit (3020) can convert cyber threat-related information into input data in the form of natural language. That is, the data preprocessing unit (3020) can convert cyber threat-related information included in data crawled from open source (3000) into input data in the form of natural language to be used as input values ​​for the AI ​​agent (2600). In one embodiment, the data preprocessing unit (3020) can analyze the crawled data to extract key information units, perform data refinement to remove or filter unnecessary information, and convert it into input data in the form of natural language suitable for the AI ​​agent (2600) corresponding to the multi-agent.

[0678] The threat analysis unit (3030) can generate threat analysis information by inputting input data into each of at least one agent model included in the AI ​​agent (2600).

[0679] In one embodiment, the threat analysis unit (3030) can automatically generate threat analysis information by extracting threat indicator-related information (e.g., hash value, IP, domain, URL, etc.) within cyber threat-related information, analyzing the overall content and context of the cyber threat-related information together, and mapping context-related information (e.g., keywords, attack type, social issue, etc.). Here, the threat indicator may be referred to as an IoC (Indicator Of Compromise) or a term having an equivalent technical meaning.

[0680] That is, according to the present disclosure, by utilizing a multi-agent based on a natural language model (e.g., LLM (Large Language Model)), information related to cyber threats in open source (3000) can be contextually understood and threat indicators and context-related information can be accurately extracted.

[0681] The threat information providing unit (3040) can provide threat analysis information to a user. In one embodiment, the threat information providing unit (3040) can transmit threat analysis information to at least one of the network CTI (1001) and the platform CTI (2200). In this case, the threat analysis information can be displayed and provided to a user through the monitoring unit (not shown) of the network CTI (1001) and the platform CTI (2200). In one embodiment, the threat analysis information can be used for malicious and normal analysis, attack technique analysis, report evaluation, etc.

[0682] According to the present disclosure, IoC and related open source intelligence information can be efficiently and automatically collected from various open sources (3000).

[0683] In addition, according to the present disclosure, accurate data reflecting context can be extracted by utilizing a natural language data structure for multi-agents.

[0684] In addition, according to the present disclosure, structured data (i.e., threat analysis information) in a specific format (e.g., JSON) can be generated and provided so that it can be utilized in a security system.

[0685] In one embodiment, the post-processing framework (2500) may utilize data and applications stored in a separate storage / database under the control of a cloud / on-premises server responsible for data processing. Here, the storage primarily uses a hard disk or SSD to store data, and the database manages structured data and can perform operations such as searching and modifying. At this time, the functions performed by the post-processing framework (2500) may be performed by a processor of the cloud / on-premises server.

[0686]

[0687] FIG. 35 discloses an example of a data collection preparation process according to an embodiment.

[0688] An open source target for cyber threat-related information is determined (S10110). In one embodiment, the target source of cyber threat-related information can be identified. That is, a target to crawl data can be determined. In one embodiment, the open source target may include various forms of open source (3000), such as websites, social media, public documents, public databases, and statistical data. For example, an open source target to collect cyber threat-related information can be determined, such as information alerts, news, security-related RSS (Really Simple Syndication) feeds, and security sites.

[0689] Determining whether access is possible for each open source target (S10120). In one embodiment, the accessibility of each source is checked, and settings can be performed for crawling cyber threat-related information. In one embodiment, the crawling scope of data to be collected and the public nature of the data can be checked. In one embodiment, it can be determined whether the data is public (e.g., whether a login is required for access, whether access is restricted to specific persons) and whether means are applied to prevent specific data from being crawled without user consent. That is, it can be checked whether the data to be crawled is publicly accessible.

[0690] Set up API or web page rendering for each open source target (S10130). In one embodiment, optimized data collection can be performed by setting up an API or web page rendering method for the open source (3000).

[0691] A data collection cycle for cyber threat-related information is set (S10140). In one embodiment, a collection cycle suitable for each open source target can be set to maintain the up-to-dateness of periodically updated information. In one embodiment, the data collection cycle can be set according to the data update frequency of the corresponding open source (3000). In one embodiment, the data collection cycle can be set based on the server load of the open source (3000) due to crawling.

[0692]

[0693] FIG. 36 discloses an example of a data collection and extraction process according to an embodiment.

[0694] Data is crawled from open source via a web page or API (S10210). In one embodiment, data can be collected from web pages and APIs to obtain information from various open sources. In one embodiment, the crawled data may include various information such as open source metadata and advertisements, along with information related to cyber threats.

[0695] In one embodiment, the crawled data may consist of various data formats, such as text, images, and videos. That is, according to the present disclosure, through crawling, data in various formats such as images and videos, as well as text, can be collected to obtain more comprehensive information.

[0696] Data collected by crawling is parsed (S10220). In one embodiment, the collected data can be parsed according to a format and converted into an analyzable form. Here, the analyzable form may consist of text data. In one embodiment, identification information including at least one of a specific tag, attribute, class, key, and ID is identified from the data structure of the crawled data, and data corresponding to the identification information can be extracted.

[0697] Additional related data regarding the collected data is obtained (S10230). In one embodiment, related data and sub-URLs regarding the collected data can be extracted. For example, additional related data or sub-links can be automatically extracted from the crawled data to expand the scope of collection of related data.

[0698]

[0699] FIG. 37 discloses an example of a data preprocessing and integration process according to an embodiment.

[0700] Text cleaning is performed on the parsed text data (S10310). In one embodiment, preprocessing can be performed to remove unnecessary information from the parsed text data and clean it so that only necessary information remains. In one embodiment, predefined patterns such as special characters, tags, numbers, spaces, stopwords, and punctuation included in the test data can be removed. In one embodiment, tokenization can be performed on the text data to distinguish it into basic units of text language.

[0701] Preprocessing is performed on at least one of image data and video data (S10320). In one embodiment, text data can be extracted through Optical Character Recognition (OCR) on image data, and key scene images of video data can be captured and image data for the corresponding images converted into text data to be preprocessed so that they can be utilized as useful information.

[0702] The structure of the data is integrated based on at least one data format among text data, image data, and video data (S10330). In one embodiment, data of various formats, such as text, images, and videos, can be integrated into a consistent structure to generate input data in the form of natural language (i.e., text in the form of natural language).

[0703] In one embodiment, keywords or units of information (e.g., values ​​for key items) can be extracted from text data. Subsequently, input data in the form of natural language can be generated by performing syntactic analysis and grammatical transformation on the extracted information to convert it according to natural language grammar.

[0704] Metadata regarding the data is collected and stored (S10340). In one embodiment, metadata including the source, creation date, crawling time, etc., regarding the data can be collected together and the data can be stored systematically.

[0705]

[0706] FIG. 38 discloses an example of a process for generating threat analysis information based on a multi-agent according to an embodiment.

[0707] A multi-agent according to the present disclosure may include at least one agent assigned a different role. Here, each agent may analyze input data in the form of natural language according to its role and produce output data corresponding to each role.

[0708] In one embodiment, the type of output data produced by each agent may be predetermined according to the type of metadata required by actual analysts. That is, according to the present disclosure, information regarding threats that require automatic response can be automatically collected by overcoming linguistic limitations that analysts do not understand (e.g., language expressing information related to cyber threats).

[0709] In one embodiment, at least one agent may include a first agent model (3110) that outputs context-related information regarding cyber threat-related information and a second agent model (3120) that outputs threat indicator-related information regarding cyber threat-related information.

[0710] For example, the first agent model (3110) may include at least one of a document extraction agent, an attack information extraction agent, a keyword extraction agent, and a social issue extraction agent.

[0711]

[0712] *661 Here, the document summarization agent can output document summary information by summarizing the entire document corresponding to cyber threat-related information. The attack information extraction agent can output attack-related information, such as attack types, targets, and methods, regarding cyber threat-related information. The keyword extraction agent can automatically identify major topics by outputting keyword information regarding cyber threat-related information. The social issue extraction agent can output social issue information mentioned in cyber threat-related information.

[0713] Additionally, for example, the second agent model (3120) may include at least one of a hash value extraction agent and a network indicator extraction agent. Here, the hash value extraction agent may output a hash value (e.g., SHA256, MD5, SHA1, etc.) for cyber threat-related information and structure it in JSON form. The threat indicator extraction agent may output a network threat indicator (e.g., IP, domain, URL, etc.) for cyber threat-related information and structure it in JSON form.

[0714] In one embodiment, output data generated by each agent may be integrated to generate threat analysis information. For example, the threat analysis information may be structured in JSON format. In this case, document summary information generated by each agent may be displayed as “document_summary,” attack-related information as “attack_infor,” keyword information as “keywords,” and social issue information as “social_issues.” Additionally, threat indicators may be displayed as “ioc_infor,” where the hash value is displayed as “hash,” and values ​​such as sha256, md5, and sha1 may be displayed. Furthermore, IP may be displayed as “ip,” domain as “domain,” and URL as “url.”

[0715]

[0716] FIG. 39 discloses an example of the output value and threat analysis information of each agent according to the embodiment.

[0717] In one embodiment, each agent included in the multi-agent can analyze input data in the form of natural language based on cyber threat-related information according to its role and produce output data corresponding to each role.

[0718] For example, the output data may include attack information, keyword information, and IoC information. For example, attack information may include attack information in natural language form regarding which attack group carried out cyber threat attacks using which attack methods, which is included in cyber threat-related information obtained from open sources, such as “The Lazarus Group uses a combination of techniques to gain access to a system, including spearphishing attacks, exploitation of vulnerabilities ~”. Additionally, keyword information may include key keywords included in cyber threat-related information, such as 'Lazarus Group', 'Bluenoroff', and 'SWIFT Alliance software'. Furthermore, IoC information may include threat indicator information including hash values ​​such as sha256, sha1, and md5.

[0719]

[0720] FIG. 40 discloses an example of the output value and threat analysis information of each agent according to an embodiment.

[0721] In one embodiment, threat analysis information may be generated based on output data produced by each agent. For example, the threat analysis information may include information integrating attack information, keyword information, and IoC information, which are output values ​​from each agent.

[0722] According to the present disclosure, regarding threat indicator-related information, threat indicator-related information (e.g., IoC) can be extracted by reflecting context-related information beyond the pattern matching limitations of regular expressions.

[0723] For example, a multi-agent according to the present disclosure can understand a sentence such as "was malicious but is not currently malicious" and extract threat indicator-related information that reflects meaning rather than a simple pattern.

[0724] In addition, according to the present disclosure, extraction accuracy and speed can be improved through a multi-agent structure. For example, through a multi-agent structure, the extraction accuracy of threat indicator-related information and context-related information can be improved by about 20%. In addition, for example, the average processing time per document can be reduced by about 100 seconds.

[0725] In addition, according to the present disclosure, various data can be structured through automatic generation in JSON format and provided so that they can be utilized in a security system.

[0726]

[0727] FIG. 41 discloses an embodiment of a method for processing cyber threat information according to an embodiment.

[0728] Cyber ​​threat-related information is obtained from open source (S11010). In one embodiment, prior to step S11010, an open source target for cyber threat-related information is determined, and data collection setting information for the open source corresponding to the open source target can be set. In one embodiment, cyber threat-related information can be obtained based on the data collection setting information for the open source.

[0729] In one embodiment, the data collection setting information may include at least one of the data collection cycle of information on whether access to open source is possible, API or web page rendering information, and cyber threat-related information.

[0730] In one embodiment, cyber threat-related information and format information of said cyber threat-related information are obtained from an open source, and said cyber threat-related information is parsed according to the format information to generate text data from said cyber threat-related information. For a detailed explanation of this, refer to the details described in FIGS. 34 to 36.

[0731]

[0732] *680 Cyber ​​threat-related information is converted into input data in the form of natural language (S11020). In one embodiment, text cleaning is performed on the parsed text data to preprocess the text data, and the preprocessed text data can be converted into input data in the form of natural language. For a detailed explanation of this, refer to the details described in FIGS. 34 and 37.

[0733] Input data is input to each of at least one agent model to generate and provide threat analysis information (S11030). In one embodiment, input data is input to each of the first agent model and the second agent model to obtain context-related information output from the first agent model and threat indicator-related information output from the second agent model, and threat analysis information based on the context-related information and threat indicator-related information can be generated and provided. For a detailed explanation of this, refer to the details described in FIGS. 34, 38 to 40.

[0734]

[0735] FIG. 42 discloses embodiments of a cyber threat information processing device according to an embodiment.

[0736] A cyber threat information processing device according to the present disclosure may include a platform CTI device (2201).

[0737] In one embodiment, the platform CTI device (2201) may include a data collection unit (3210), a network threat detection unit (3220), an intrusion indicator threat detection unit (3230), and a threat information providing unit (3240).

[0738] The data collection unit (3210) can acquire packet data from the client (101). In one embodiment, the packet data may include a PCAP (packet capture) file. In one embodiment, the data collection unit (3210) can acquire packet data from network traffic. In one embodiment, the data collection unit (3210) can acquire packet data from the network CTI device (1001). In one embodiment, the data collection unit (3210) can acquire packet data of a file uploaded according to user input of the client (101).

[0739] The network threat detection unit (3220) can generate network threat analysis information by performing packet analysis on packet data. In one embodiment, the packet analysis may include Deep Packet Inspection (DPI) analysis, which examines and analyzes not only the header of the packet data but also the payload or the actual data within the packet.

[0740] In one embodiment, the network threat detection unit (3220) can perform packet analysis to extract at least one of metadata and binary data from the packet data.

[0741] In one embodiment, the network threat detection unit (3220) can extract metadata from packet data. For example, the metadata may include at least one of an IP, domain, URL, filename, hash value, port, application, protocol, and communication information for the packet data.

[0742] In one embodiment, the network threat detection unit (3220) may analyze metadata to generate information regarding whether the packet data is malicious and related threat information. In one embodiment, the related threat information may include whether the information is malicious (e.g., detection name in the case of a file, whether it is malicious in the case of an IP and a domain) and the last activity date related to the infringement indicator. Additionally, the related threat information for a file may include at least one of a file type, file size, hash value, file feature information (e.g., Tag), a list of filenames used, an attack technique used, a list of similar files (e.g., a list of files using similar functions), an associated IP, a domain, and a URL. Additionally, the related threat information for an IP may include at least one of the country of IP usage, IP usage status (e.g., last communication date, port, protocol, application), DNS information, a communication file, and an associated URL. Additionally, the related threat information for a domain may include at least one of a host IP, a communication file, and an associated IP. In addition, the related threat information may include at least one of the associated attack group (e.g., attack group name, attack base country) and the associated campaign information (e.g., campaign activity period, target country, target industry, threat type of the campaign, and associated indicators of compromise including other indicators of compromise used together during the campaign).

[0743] In one embodiment, the network threat detection unit (3220) may obtain analysis information generated by the network CTI through DPI analysis from the network CTI. In this case, the analysis information of the network CTI may include at least one of identification information of IP, port, protocol and application, communication information and sensitive information (e.g., personal information, credit card information).

[0744] In one embodiment, the network threat detection unit (3220) can extract binary data from packet data. In one embodiment, the network threat detection unit (3220) can perform metadata analysis if an analysis result for the binary data exists. In one embodiment, the network threat detection unit (3220) can perform an analysis of the binary data and generate metadata as a result if no analysis result for the binary data exists.

[0745] In one embodiment, the network threat detection unit (3220) can examine at least one of metadata and binary data using a detection rule stored in advance through a network black box (not shown) included in the second CTI device (2000). For example, the detection rule may include one of a Snort rule or a Suricata rule. In this case, the platform CTI device (2201) can store packet data per session and provide at least one of the stored packet data, metadata, and binary data to the network black box.

[0746] In one embodiment, the network threat detection unit (3220) can examine all sessions within the packet data through a network black box and detect specific sessions that may be a threat, such as abnormal signs, using detection rules. Here, the specific sessions may include sessions that are likely to be a threat or sessions that are currently a threat.

[0747] In one embodiment, the currently occurring (detected) threat does not occur as a single threat, but generally occurs together with preliminary work associated with the threat (e.g., preliminary penetration work to identify vulnerabilities such as port scanning). Accordingly, the network threat detection unit (3220) can backtrack the threat by analyzing the entire previously collected traffic in reverse using the threat detected through the network black box, determining when and in which session the threat started and through which path it led to the current threat. That is, the network threat detection unit (3220) can track the threat using the entire data of the collected packet data, rather than using specific input values ​​through the network black box.

[0748] In one embodiment, the network black box can perform regression analysis to trace back threats using the entire data of collected packet data, rather than using specific input values.

[0749] At this point, the regression analysis is explained in detail as follows.

[0750] First, the network black box can inspect packet data collected in real time using rules. Subsequently, when a rule is registered at a specific point in time, the network black box can use the registered rule to trace back and inspect all previously stored packet data. At this time, the network black box can trace back when threat activity (or abnormal signs corresponding to abnormal activity, etc.) began regarding the previously stored packet data.

[0751] In addition, the network black box can assign additional meaning to detected sessions by examining packet data using rules. For example, the network black box can identify the first session as a threat activity. In this case, if similar threat activities are detected multiple times, the network black box can identify external / internal ports, WEB vulnerability scans, etc., and verify whether there is an attack to the outside through an internal infection.

[0752] For example, the network black box can identify when a scan occurs from an external IP to an internal IP range via internal ports 22 (SSH) or 3389 (RDP). In other words, the network black box can identify if such sessions are infected with malware and are scanning internal assets externally.

[0753] In addition, in one embodiment, the network black box can analyze threat information using information such as IP, protocol, application, binary, URL, and event data.

[0754] The network threat detection unit (3220) can detect threats based on threat information analyzed and output through the network black box. In one embodiment, the network threat detection unit (3220) can generate vulnerability information based on threats detected through the network black box.

[0755] In one embodiment, the network threat detection unit (3220) can generate network threat analysis information by analyzing at least one of packet data, metadata, and binary data using detection rules as described above. That is, the platform CTI device (2201) can detect whether all session information regarding packet data, for example, PCAP files, is malicious.

[0756] In one embodiment, the network threat analysis information may include at least one of malware-related information, sensitive information-related information, and network-related information.

[0757] For example, malware-related information may include at least one of the following: whether the packet data is malicious, data file type, hash value (e.g., MD5, SHA-1, SHA-256, etc.), cyber threat type, whether there is communication with a C2 (Command & Control) server where an attacker can remotely control and issue commands to the malware, vulnerability information (e.g., CVE-ID), attack technique, attack group, and campaign information including malicious indicators for analysis (e.g., indicator file, IP, domain, URL, etc.).

[0758] In addition, for example, information related to sensitive information may include at least one of user information related to packet data, a credit card number, and whether the said sensitive information has been leaked.

[0759] In addition, for example, network-related information may include at least one of unauthorized protocol and port information, whether there is ARP (Address Resolution Protocol) traffic, infringement indicator information, IP information, and domain information.

[0760] Here, unauthorized protocol and port information may indicate whether there is a connection to a protocol or port that is not typically used (e.g., used less than a certain number of times over a certain period). Additionally, compromise indicator information may include at least one compromise indicator related to the packet data (e.g., at least one of an IP, domain, URL, and hash value). Additionally, IP information may include at least one of the maliciousness of the IP related to the packet data, country, malicious basis, attacker group, and campaign information. Additionally, domain information may include at least one of the Domain Generation Algorithm (DGA) detection status of the domain related to the packet data, maliciousness, malicious basis, attack group, and campaign information. In one embodiment, DGA detection status and malware detection may be detected using an AI agent (2600).

[0761] In one embodiment, the network threat detection unit (3220) may detect at least one of the infringement indicators included in the infringement indicator information for packet data as a malicious infringement indicator. In one embodiment, the network threat detection unit (3220) may transmit only the network malicious analysis information regarding the remaining infringement indicators that were not detected as malicious infringement indicators among the multiple infringement indicators to the infringement indicator threat detection unit (3230). That is, according to the present disclosure, malicious analysis is performed first to generate network malicious analysis information, and malicious analysis is performed secondarily on infringement indicators that were not judged to be malicious in the first malicious analysis to improve the accuracy of malicious judgment and system efficiency.

[0762] The infringement indicator threat detection unit (3230) can generate malicious indicator threat analysis information for packet data based on network malicious analysis information. In one embodiment, the infringement indicator threat detection unit (3230) can generate malicious indicator threat analysis information for packet data based on network malicious analysis information using a pre-stored cyber threat intelligence database (2205). For example, in one embodiment, the infringement indicator threat detection unit (3230) can perform a second malicious analysis by comparing an infringement indicator that was not judged as malicious in the preceding first malicious analysis with an infringement indicator that was pre-classified as malicious in the cyber threat intelligence database (2205), and if the infringement indicators match, re-judge the infringement indicator that was not judged as malicious in the first malicious analysis as malicious.

[0763] In one embodiment, if the infringement indicator associated with the packet data is a file, the infringement indicator threat detection unit (3230) can determine whether it is malicious through an AI agent (2600). The infringement indicator threat detection unit (3230) can determine whether the file is malicious by using an AI agent (2600) that has been trained on the pattern of a malicious file, and can provide a detection name (e.g., exe.ransomware.generic).

[0764] In one embodiment, when the infringement indicators associated with packet data are IP and domain, the infringement indicator threat detection unit (3230) can determine whether it is malicious by using various information such as the history of communication with malicious files, the history of attack group usage, the history of campaign usage, and vulnerability server attacks.

[0765] In one embodiment, when at least one of a file hash value, IP, and domain is input through an API (2000) provided by the platform CTI (2201), a secondary malicious analysis result may be generated in a specific format (e.g., JSON). Here, the secondary malicious analysis result may include whether it is malicious and related threat information.

[0766] That is, according to the present disclosure, a secondary malicious analysis of individual infringement indicators related to packet data can be performed by querying a cyber threat intelligence database (2205).

[0767] The threat information providing unit (3240) may provide threat analysis information based on network threat analysis information and malicious indicator threat analysis information. In one embodiment, the threat information providing unit (3240) may input at least one of the network threat analysis information and malicious indicator threat analysis information into a natural language model (2320) included in the AI ​​agent (2600), and obtain threat analysis information in the form of natural language output by the natural language model (2320). In one embodiment, the threat analysis information may include at least one of malicious information related to packet data, cyber threat information, and supplementary information regarding the threat. Here, the threat analysis information may be provided in a summary format in the form of natural language. For example, threat analysis information may include content such as “On October 3, 2024 at 14:34:25, an internal IP 10.10.0.4 connected to the malicious IP 156.34.22.43 and downloaded malware. The malware is a file mainly used by the North Korea-based Lazarus Group attack group~”.

[0768] In one embodiment, threat analysis information may be provided through various cyber threat information processing screens of a client (101) or platform CTI (2201). In one embodiment, the cyber threat information processing screen may include at least one of a file upload screen, a file analysis result screen, a session list screen, a session details screen, a file list screen, a file analysis information screen, and a file analysis details screen. In this case, each embodiment of the various cyber threat information processing screens is described in detail below.

[0769] According to the present disclosure, the reliability of the analysis can be improved by performing primary packet analysis and secondary database query analysis on packet data.

[0770] In addition, according to the present disclosure, threats that have not been previously detected can be identified by performing a detailed analysis on packet data included in suspected network traffic. For example, threats identified according to the present disclosure may include the detection of network threats and malware, such as the use of unauthorized applications, leakage of personal information, leakage of sensitive information such as credit cards, and ARP spoofing (Address Resolution Protocol spoofing).

[0771] In one embodiment, the platform CTI device (2201) can utilize data and applications stored in a separate storage / database under the control of a cloud / on-premises server responsible for data processing. Here, the storage primarily uses a hard disk or SSD to store data, and the database manages structured data and can perform operations such as searching and modifying. At this time, the functions performed by the platform CTI device (2201) can be performed by a processor of the cloud / on-premises server.

[0772]

[0773] FIG. 43 discloses an example of a file upload screen according to an embodiment.

[0774] In one embodiment, the file upload screen (3251) may include a file input window and a list of uploaded files.

[0775] Here, the file input window may include an upload input interface for a user to upload a file. In one embodiment, the uploaded file may include a PCAP file and a compressed file (e.g., zip, gz). In one embodiment, when a file to be analyzed is uploaded through the file input window, a list of uploaded files may be displayed at the bottom. Here, the file list may include at least one of a filename and a file size. For example, the filename may be pcap_2024120101.pcap and the file size may be 5,134KB.

[0776] In one embodiment, since PCAP files are often saved as multiple PCAP files rather than a single file, analysis can be requested through the analysis request button after multiple files are uploaded by the user.

[0777]

[0778] FIG. 44 discloses an example of a file analysis result screen according to an embodiment.

[0779] In one embodiment, the file analysis result screen (3252) may include one of the file analysis result analyzed by an AI agent included in the threat analysis information, the file analysis result analyzed by a network black box, and the malicious file analysis result.

[0780] Here, the file analysis result analyzed by the AI ​​agent may represent the result of determining whether the connected domain associated with the packet data is a DGA domain. In one embodiment, the file analysis result analyzed by the AI ​​agent may include at least one of a ticket ID, ticket occurrence time, source (SRC) IP and port, destination (DST) IP and port, protocol, application, and DGA detection domain for the domain.

[0781] For example, a total of 2,989 file analysis results can be generated by the AI ​​agent, and one of the 2,989 file analysis results can be represented as follows: Ticket ID 5602, ticket occurrence time 2024-12-11 20:40:18, Source (SRC) IP and port 164.124.101.2:53, Destination (DST) IP and port 10.10.0.98:52537, protocol dns, and DGA detection domain SANS_DNS_Query to a *.pw domain - Likely Hostile.

[0782] Additionally, the file analysis result analyzed by the network black box may include session information detected by rules through the network black box. In one embodiment, the file analysis result analyzed by the network black box may include at least one of an analysis result number, a start date, an end date, a source (SRC) IP and port, a destination (DST) IP and port, a protocol, an application, and a signature.

[0783] For example, a total of 13,732 file analysis results can be generated by the network black box, and one of the 13,732 file analysis results can be represented as follows: analysis result number 13732, start date 2021-12-11 20:39:19, end date 2024-12-11 20:39:19, source (SRC) IP and port 10.10.0.15:60546, destination (DST) IP and port 203.248.252.2:53, protocol dns, signature SANS_DNS_Query to a *.pw domain - Likely Hostile.

[0784] In one embodiment, the target file of the file analysis result analyzed by the network black box may include a PCAP file. Additionally, in one embodiment, the file analysis result analyzed by the network black box may include the traffic analysis result analyzed by the network black box.

[0785] Additionally, the malicious file analysis result may include malicious file information related to packet data. In one embodiment, the malicious file analysis result may include at least one of an analysis result number, collection time, source (SRC) IP and port, destination (DST) IP and port, protocol, hash value, and download URL.

[0786] For example, a total of 5,273 malicious file analysis results may be generated, and one of the 5,273 malicious file analysis results may be represented as follows: analysis result number 5273, collection time 2024-12-11 16:44:14, source (SRC) IP and port 10.10.0.138:37650, destination (DST) IP and port 61.111.9.23:9020, protocol http, hash value (e.g., SHA-256) D12125ABCFE47EACF953FB3A573… (etc.), download URL storage.malwares.com:9020 / ctx-file-binary / D1 / 21 / 25 / D12125ABCFE47E… (etc.).

[0787]

[0788] FIG. 45 discloses an example of a session list screen according to an embodiment.

[0789] In one embodiment, the session list screen (3253) may include one of statistical information on packet data included in threat analysis information and session-specific communication information.

[0790] Here, the statistical information may include at least one of a source (SRC) IP and the number of corresponding sessions, a destination (DST) IP and the number of corresponding sessions, a protocol and the number of corresponding sessions, and an application and the number of corresponding sessions. For example, the source (SRC) IP may be 10.10.0.12, the number of sessions for the source IP may be 21,906, the destination (DST) IP may be 8.8.8.8, the number of sessions for the destination IP may be 7,169, the protocol may be ssl, the number of sessions associated with the protocol may be 145,457, the application may be A, and the number of sessions associated with the application may be 69,754. In one embodiment, each item included in the statistical information may be sorted in descending or ascending order according to the number of occurrences.

[0791] Additionally, session-specific communication information may include at least one of a start date, an end date, a source (SRC) IP and port, a destination (DST) IP and port, a protocol, an application, a send count, a receive count, a total count, a send byte, a receive byte, and a total byte. For example, the start date may be 2024-12-10 19:48:09, the end date may be 2024-12-10 19:48:09, the source (SRC) IP may be 10.10.0.11, the source port may be 64822, the destination (DST) IP may be 142.250.198.42, the destination port may be 443, the protocol may be quic, the application may be A, the send count may be 10, the receive count may be 14, the total count may be 24, the send byte may be 5,015 bytes, the receive byte may be 6,178 bytes, and the total byte may be 11,193 bytes. In one embodiment, each item included in the session-specific communication information may be sorted in descending or ascending order according to the number of items.

[0792] In one embodiment, the session list screen (3253) may include a search function for packet data. Here, at least one of a search start date and a search end date may be set for the search function.

[0793]

[0794] FIG. 46 discloses an example of a session list detail screen according to an embodiment.

[0795] In one embodiment, the session list detail screen (3254) may include one of the source (SRC) IP and port, destination (DST) IP and port, session information, and protocol information included in the threat analysis information.

[0796] Here, the source (SRC) IP and port and the destination (DST) IP and port may represent the source IP and port and the destination IP and port for the session associated with the packet data. For example, the source IP and port for the session associated with the PCAP file may be 172.16.133.132:53742, and the destination IP and port may be 204.93.223.148:80.

[0797] Additionally, session information may include at least one of a start date, end date, protocol, application, source transmission count, transmitted bytes, data bytes, destination reception count, received bytes, data bytes, source Ethernet information, destination Ethernet information, source and destination payload information, and TCP flags.

[0798] For example, the start date for the session is 2024-12-11 19:37:45, the end date is 2024-12-11 19:37:56, the protocol is http, the application is L, the source send count is 6, the send bytes are 807, and the data bytes are 403, the destination receive count is 5, the receive bytes are 513, and the data bytes are 171, the MAC address among the source Ethernet information is 00:50:43:01:4d:d4, the Organizationally Unique Identifier (OUI) is Semiconductor, Inc., the destination Ethernet information is 00:90:7f:3e:02:d0, and the OUI is Technologies, Inc., the source payload information is 474554202f312f63, the destination payload information is 485454502f312e31, and the TCP flag is syn 1, It can be represented as syn-ack 1, ack 5, fin 2, rst 0, urg 0, srcZeor 0, dstZero 0.

[0799] Additionally, protocol information may include at least one of the Method, status code, host, user agent, request header, client version, and response header for the communication protocol used in the session.

[0800] For example, the communication protocol used in the session may be represented as GET (1), the status code as 204, the host as Beacon-3.L.com, the user agent as Mozilla / 5.0, the request headers as accept, accept-encoding, host, connection accept-language, user-agent, referer, the client version as 1.1, and the response headers as content-type, expires, set-cookie.

[0801]

[0802] FIG. 47 discloses an example of a file list screen according to an embodiment.

[0803] In one embodiment, the file list screen (3255) may include one of statistical information on binary data extracted from PCAP files during a specific period included in threat analysis information and session-specific file collection information.

[0804] Here, statistical information may include at least one of the source (SRC) IP and the number of sessions for the corresponding binary data, the destination (DST) IP and the number of sessions, the file format and the number of sessions for the detected file, and the number of sessions for the detected file format.

[0805] For example, the source (SRC) IP for the corresponding binary data of the statistical information is 10.10.0.138, the number of sessions for the source IP is 38,309, the destination (DST) IP is 61.111.9.23, the number of sessions for the destination IP is 38,309, the file format for the detection file 0078972F3FF73FF1B783B8E725… (etc.) is PDF, the number of sessions is 1, and the number of sessions for the detection file format PDF is 6,830.

[0806] Additionally, session-specific file collection information may include at least one of the following: the time of collection of binary data for the corresponding session, the source (SRC) IP and port, the destination (DST) IP and port, the protocol, the hash value, the download URL, the file format, and the detection name via the AI ​​agent. In other words, session-specific file collection information may indicate which session the file was collected from.

[0807] For example, the collection time of the binary data is 2024-12-11 16:44:16, the source (SRC) IP is 10.10.0.138, the source port is 56204, the destination (DST) IP is 61.111.9.23, the destination port is 9020, the protocol is http, the hash value (e.g., SGA-256) is 4728DEA5C9FFEB538… (omitted), the download URL is storage.malwares.com9020 / ctx-file-binary… (omitted), the file format is PDF, and the detection name through the AI ​​agent can be indicated as normal. In one embodiment, session details and file details can be checked through the network and file buttons in the list.

[0808] In one embodiment, the file list screen (3255) may include a search function for a binary list. Here, at least one of a search start date and a search end date may be set for the search function.

[0809]

[0810] FIG. 48 discloses an example of a file analysis information screen according to an embodiment.

[0811] In one embodiment, the file analysis information screen (3256) may include one of file information and intrusion indicator intelligence information regarding packet data included in the threat analysis information.

[0812] Here, file information may include at least one of a detected file, a detection name through an AI agent, the date of initial collection of the file, the date of last activity verification, a file type, a file size, a filename, a hash value, cross-check information through another cyber threat intelligence system, a threat type, and an attack technique.

[0813] For example, the detected file is F2085B5E3DDFF36029BAE… (etc.), the detection name via the AI ​​agent is exe.adware.softcnapp, the file's initial collection date is 2022-03-09 03:59:04, the last activity confirmed date is 2024-12-11 09:54:35, the file type is exe_32bit, the file size is 2,634,856 bytes, the filename is JKCovUtl.exe, and the hash value (e.g., MD5) is F962E751C92FB222643… (...), cross-checked information from other cyber threat intelligence systems can be identified as Gen:Variant.Midie.109594, Win.Trohan.Generic-9959023-0, threat type as Virus, and attack techniques as T1012, T1036, T1057, T1071, etc.

[0814] Indicator of compromise intelligence information may include at least one of an indicator of compromise association chart, IP information associated with the file, domain information, and similar file information. For example, the indicator of compromise association chart may visualize indicators of compromise associated with the file by classifying them by type (e.g., file, IP, domain).

[0815] In addition, IP information associated with the file may include at least one of the last activity date of the IP, the IP, the result of malicious detection through an AI agent, the attack group, the target country, the target industry, and the campaign. For example, the last activity date of the IP may be 2024-10-02 00:26:50, the IP may be 23.216.147.64, and the result of malicious detection through an AI agent may be normal.

[0816] The domain information associated with the file may include at least one of the last activity date of the domain, the domain, the result of malicious detection by an AI agent, the attack group, the target country, the target industry, and the campaign. For example, the last activity date of the domain may be 2024-01-24 21:05:07, the domain may be 82.250.63.168.in-addr.arpa, and the result of malicious detection by an AI agent may be normal.

[0817] Similar file information associated with the file may include at least one of the following: the initial collection date of the similar file, hash value, file type, detection name via AI agent, attack technique, attack group, target country, target industry, and campaign. For example, the initial collection date of the similar file may be 2023-12-29 03:07:34, the hash value (e.g., SHA-256) may be 3ED25D1968D1260… (etc.), the file type may be EXE, the detection name via AI agent may be exe.adware.softcna… (etc.), and the attack technique may be T1112.

[0818]

[0819] FIG. 49 discloses an example of a file analysis detail screen according to an embodiment.

[0820] In one embodiment, the file analysis detail screen (3257) may include one of the Advanced Persistent Threat Intelligence information and the attack technique matrix included in the threat analysis information.

[0821] Here, the intelligent persistent threat intelligence information may include at least one of campaign statistics related to cyber threat analysis, associated campaign charts, specific campaign information, and associated campaign information. For example, campaign statistics related to cyber threat analysis may include at least one of an attack group (e.g., TA505), a target country (e.g., United Arab Emirates, Czech Republic, etc.), and a target industry (e.g., arts, entertainment, consulting, etc.).

[0822] In addition, the associated campaign chart may include a chart that visually illustrates the relationships between attack groups, threat types, campaigns, target countries, target industries, attack techniques, and files.

[0823] Additionally, specific campaign information may include at least one of an attack group, threat type, target country, target industry, attack technique, CVE, and indicators of compromise for a specific campaign. For example, for a specific campaign threat—d807b527-2… (omitted), the attack group may be TA505, the threat type may be makoob, the target country may be Iran, the target industry may be consulting, the attack technique may be T1071, and the indicators of compromise may include at least one of a file, IP, domain, and URL address.

[0824] In addition, the associated campaign information may indicate at least one of the following: the campaign activity period associated with the campaign, the campaign ID, the attack group, the target country, the target industry, the threat type, and the number of indicators of compromise by type. For example, the campaign activity period of the campaign may be 2024-12-05 14:00:51~2024-12-07 05:45:51, the campaign ID may be threat―d807b527-2… (omitted), the attack group may be TA505, the target country may be Iran, the target industry may be consulting, the threat type may be makoob, and the number of indicators of compromise may be 10 files.

[0825] In addition, an attack technique matrix may represent a framework or matrix that organizes attack techniques for cyber threat analysis and response. For example, the MITRE ATT&CK attack technique matrix may include various matrices such as Execution, representing execution using a command interpreter (PowerShell); Persistence, representing user account modification and maintenance; and Privilege Escalation, representing privilege escalation through malicious code injection into a process.

[0826]

[0827] FIG. 50 discloses an embodiment of a cyber threat information processing method according to an embodiment.

[0828] Packet data is obtained from the client (S12010). In one embodiment, the packet data may include a PCAP (packet capture) file. For a detailed explanation of this, refer to the details described in FIGS. 42 and 43.

[0829] Network threat analysis information is generated by performing packet analysis on packet data (S12020). In one embodiment, at least one of metadata and binary data for packet data is generated by performing packet analysis on packet data, and network threat analysis information for packet data can be generated based on at least one of metadata and binary data according to a pre-stored detection rule. In one embodiment, the network threat analysis information may include at least one of malware-related information, sensitive information-related information, and network-related information for packet data. For a detailed explanation of this, refer to the details described in FIGS. 42 to 49.

[0830] Based on network malicious analysis information, malicious indicator threat analysis information for the packet data is generated (S11030). In one embodiment, malicious indicator threat analysis information for the packet data can be generated based on infringement indicator information included in the network threat analysis information. For a detailed explanation of this, refer to the details described in FIGS. 42 to 49.

[0831] Threat analysis information based on network threat analysis information and malicious indicator threat analysis information is provided (S12040). In one embodiment, network threat analysis information and malicious indicator threat analysis information are input into a natural language model to generate threat analysis information in the form of natural language, and threat analysis information can be provided to a client. For a detailed explanation of this, refer to the details described in FIGS. 42 to 49.

Claims

1. A step of obtaining packet data from the client; A step of generating network threat analysis information by performing packet analysis on the above packet data; A step of generating malicious indicator threat analysis information for the packet data based on the above network malicious analysis information; and A step of providing threat analysis information based on the above network threat analysis information and malicious indicator threat analysis information; including, Cyber ​​threat information processing method.

2. In Paragraph 1, The step of analyzing the above network threat analysis information is, A step of performing packet analysis on the above packet data to generate at least one of metadata and binary data for the above packet data; A step of generating network threat analysis information for the packet data based on at least one of the metadata and binary data according to a pre-stored detection rule; including, Cyber ​​threat information processing method.

3. In Paragraph 2, The above network threat analysis information includes at least one of malware-related information, sensitive information-related information, and network-related information regarding the above packet data. Cyber ​​threat information processing method.

4. In Paragraph 1, The step of analyzing the above malicious indicator threat analysis information is, A step of generating malicious indicator threat analysis information for the packet data based on infringement indicator information included in the network threat analysis information; including, Cyber ​​threat information processing method.

5. In Paragraph 1, The steps provided above are, A step of generating threat analysis information in the form of natural language by inputting the above network threat analysis information and malicious indicator threat analysis information into a natural language model; and A step of providing the threat analysis information to the above client; including, Cyber ​​threat information processing method.

6. A storage device for storing data; and In-memory storing a set of library engines related to software; and A processor that executes the above software; including, The above processor is, Obtain packet data from the client, and Generate network threat analysis information by performing packet analysis on the above packet data, and Based on the above network malicious analysis information, malicious indicator threat analysis information for the above packet data is generated, and Providing threat analysis information based on the above network threat analysis information and malicious indicator threat analysis information, Cyber ​​threat information processing device.

7. In Paragraph 6, The above processor is, Perform packet analysis on the above packet data to generate at least one of metadata and binary data for the above packet data, and Generating network threat analysis information for the packet data based on at least one of the metadata and binary data according to a pre-stored detection rule, Cyber ​​threat information processing device.

8. In Paragraph 7, The above network threat analysis information includes at least one of malware-related information, sensitive information-related information, and network-related information regarding the above packet data. Cyber ​​threat information processing device.

9. In Paragraph 6, The above processor is, Generating malicious indicator threat analysis information for the packet data based on infringement indicator information included in the above network threat analysis information, Cyber ​​threat information processing device.

10. In Paragraph 6, The above processor is, The above network threat analysis information and malicious indicator threat analysis information are input into a natural language model to generate threat analysis information in the form of natural language, and Providing the threat analysis information to the above client, Cyber ​​threat information processing device.

11. Obtain packet data from the client, and Generate network threat analysis information by performing packet analysis on the above packet data, and Based on the above network malicious analysis information, malicious indicator threat analysis information for the above packet data is generated, and A storage medium storing computer-executable software that performs the step of providing threat analysis information based on the above-mentioned network threat analysis information and malicious indicator threat analysis information.