Cyber threat information processing device, cyber threat information processing method, and storage medium storing computer-executable program for processing cyber threat information

The cyber threat information processing device and method address inefficiencies in existing systems by implementing AI-driven natural language processing and real-time threat detection, enabling comprehensive analysis and automated error reduction for cyber threat management.

WO2026141735A1PCT designated stage Publication Date: 2026-07-02SANDS LAB INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SANDS LAB INC
Filing Date
2024-12-26
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing cyber threat detection systems struggle to comprehensively analyze network-based traffic, fail to detect threats to OT assets and IoT devices, require manual processing of false positives, and lack high-quality AI training datasets, leading to inefficiencies and potential errors.

Method used

A cyber threat information processing device and method that utilizes network-based traffic analysis, real-time threat detection, and AI-driven natural language processing to identify and explain cyber threats, generate intuitive insights, and automate the processing of false positives, while providing high-quality data sets for threat analysis.

Benefits of technology

Enables comprehensive and real-time analysis of cyber threats across various assets, reduces manual processing errors, and provides intuitive insights into cyber threat information, facilitating efficient and accurate response strategies.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2024021190_02072026_PF_FP_ABST
    Figure KR2024021190_02072026_PF_FP_ABST
Patent Text Reader

Abstract

An embodiment according to the present disclosure provides a cyber threat information processing method comprising the steps of: collecting data on cyber threat information; calculating statistical data according to a type of the collected data; and generating natural language news on the collected data according to a prompt generated on the basis of the calculated statistical data.
Need to check novelty before this filing date? Find Prior Art

Description

Cyber ​​threat information processing device, cyber threat information processing method, and storage medium storing a computer-executable program for processing cyber threat information

[0001] The disclosed embodiments relate to a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing a program for processing cyber threat information.

[0002] The damage caused by increasingly sophisticated cyber security threats, centered on new or variant malware, is growing. To mitigate such damage and enable early response, we are simultaneously advancing our response technologies through multi-dimensional pattern construction and various complex analyses.

[0003] Until now, companies have focused on perimeter-based security to detect and block traffic between the internal and external environments using technologies such as Virtual Private Networks (VPNs), firewalls, and Intrusion Detection Systems (IDS) / Intrusion Prevention Systems (IPS). However, they are facing difficulties with security measures due to the complexity of technology, the diversity of attacks, and the increasing number of attack points.

[0004] To respond to cyber threats through network-based traffic, network layer or transport layer-based traffic analysis had the problem of being unable to comprehensively and visually analyze threat information.

[0005] Therefore, there was a problem in that the detection of cyber threats through network-based traffic targeted only information technology (IT) assets and could not detect or identify threats to operation technology (OT) assets or Internet of Things (IoT) devices.

[0006] There was a problem in that the analysis of cyber threats through network-based traffic was fragmentary and mostly only possible after a breach, making it difficult to analyze large amounts of network traffic in real time and respond to cyber threats.

[0007] Furthermore, malicious activities based on cyber threat information were analyzed using various inconsistent techniques or information that could not be accurately described by anyone other than an expert, making it difficult to easily understand their mechanisms and the basis for analysis.

[0008] Meanwhile, while such analysis requires high-quality AI training datasets to respond to cyber threats, these datasets have been difficult to find. Even companies seeking to develop technologies to counter cyber threats using AI face the problem of struggling to locate appropriate data or malware samples.

[0009] Against this backdrop, while the demand for high-quality AI training datasets has recently surged, there have been technical and practical difficulties in acquiring the desired malware analysis data.

[0010] When the results detected by the cyber threat information processing system included false positives, the process of resolving them had to be done manually by humans. For example, when a complaint email regarding the processing results of the cyber threat information processing system is received, the administrator of the cyber threat information processing system or the mail system checks and classifies the email, extracts necessary information, and performs reclassification.

[0011] However, this manual processing method was repetitive and inefficient, consuming a significant amount of time and effort, and also had the problem of increasing the fatigue of the processor, which could lead to the possibility of errors.

[0012] In particular, there was a problem where data omissions or incorrect processing that could occur during the handling of complaint emails could have a negative impact on customer satisfaction.

[0013] Cyber ​​threat information comes in various forms and types, and there are many ways to represent it. For example, there are various cyber threats categorized by attack groups, attack techniques, and threat classifications. Therefore, without insight into this, even if such information is extracted, it is difficult to interpret accurately, and there was a problem in that analysis required a significant amount of time.

[0014] The purpose of the embodiments disclosed below is to solve the above problems by providing a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing a program for processing cyber threat information, which can comprehensively and visually analyze cyber threat information through network-based traffic.

[0015] Another objective of the embodiment is to provide a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing a program for processing cyber threat information, which can detect or identify cyber threats to various assets such as IT assets, as well as operational technology (OT) assets or IoT devices.

[0016] Another objective of the embodiment is to provide a cyber threat information processing device capable of analyzing network traffic in real time and responding to cyber threats, a cyber threat information processing method, and a storage medium storing a program for processing cyber threat information.

[0017] Another objective of the embodiment is to provide a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing a program for processing cyber threat information, so that even a non-expert user can easily understand the mechanism and basis of analysis of detected or analyzed cyber threat information.

[0018] The present invention provides a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing a program for processing cyber threat information, which can easily obtain malware analysis data and acquire specifically required or technically necessary data sets to respond to cyber threats.

[0019] Another objective of the examples disclosed below is to provide a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing the program, which can efficiently and quickly process false positive results of a cyber threat information processing system while reducing the occurrence of errors.

[0020] Another objective of the examples disclosed below is to provide a cyber threat information processing device, a cyber threat information processing method, and a storage medium storing the program, which allow a user to have intuitive insights into cyber threat information processed data.

[0021] One disclosed embodiment provides a method for processing cyber threat information, comprising: a step of collecting data regarding cyber threat information; a step of calculating statistical data according to the type of the collected data; and a step of generating natural language news regarding the collected data according to a prompt generated based on the calculated statistical data.

[0022] If the collected data above is related to a cyber threat attack group campaign (APT), the embodiment can generate the natural language news based on abnormal data of the attack means used by the campaign.

[0023] If the collected data is included in the Indicator of Compromise of cyber threat information, the natural language news can be generated based on the data corresponding to the Indicator of Compromise.

[0024] The above prompt may include a request part that requests the generation of the above natural language news, a headline part that provides a headline format among the above natural language news, and a data part that includes the above-described statistical data.

[0025] An embodiment disclosed from another perspective includes a database for storing data; and a processor for processing said data, wherein the processor may execute a set of instructions including instructions for collecting data regarding cyber threat information; calculating statistical data according to the type of said collected data; and generating natural language news about said collected data according to a prompt generated based on said calculated statistical data.

[0026] In another aspect, an embodiment provides a storage medium storing a computer-executable program for processing cyber threat information, which includes a set of instructions comprising: collecting data regarding cyber threat information; calculating statistical data according to the type of the collected data; and generating natural language news regarding the collected data according to a prompt generated based on the calculated statistical data.

[0027] According to the disclosed embodiment, comprehensive and visible analysis of cyber threat information through network-based traffic is possible.

[0028] According to the disclosed embodiments, cyber threats to various assets, such as IT assets as well as operational technology (OT) assets or IoT devices, can be detected or identified.

[0029] According to the disclosed embodiment, network traffic can be analyzed in real time and cyber threats can be responded to.

[0030] According to the disclosed embodiments, even if the user is not an expert, they can easily understand the mechanism and basis of analysis of detected or analyzed cyber threat information.

[0031] According to the disclosed embodiment, through the execution of ASM, vulnerabilities of assets can be identified to apply security controls and reinforce cybersecurity strategies and policies for assets.

[0032] According to the disclosed example, malware analysis data can be easily obtained to respond to cyber threats, and specifically required or technically necessary data sets can be acquired.

[0033] According to the disclosed example, the occurrence of errors can be reduced while efficiently and quickly processing false positive results of a cyber threat information processing system.

[0034] According to the disclosed example, users can gain intuitive insights into cyber threat information processed data and easily obtain natural language-based insights through interpretation information inherent in a vast amount of cyber threat information.

[0035] FIG. 1 is a drawing disclosing an embodiment of a cyber threat information processing method according to an embodiment.

[0036] FIG. 2 is a drawing disclosing another embodiment of a cyber threat information processing method according to an embodiment.

[0037] FIG. 3 is a drawing disclosing embodiments of a cyber threat information processing device according to an embodiment.

[0038] FIG. 4 is a drawing illustrating a first CTI device as an embodiment of a cyber threat information processing device according to an embodiment.

[0039] FIG. 5 is a drawing disclosing an example in which a first CTI device and a second CTI device are interconnected as an embodiment of cyber threat information processing devices according to an embodiment.

[0040] FIG. 6 is a drawing disclosing another example in which a first CTI device and a second CTI device are interconnected as an embodiment of cyber threat information processing devices according to an embodiment.

[0041] FIG. 7 discloses embodiments of a cyber threat information processing device according to an embodiment.

[0042] FIG. 8 discloses an example of active ASM execution based on network traffic collection according to an embodiment.

[0043] FIG. 9 discloses an example of providing vulnerability details and measures identified by ASM technology according to an embodiment.

[0044] FIG. 10 discloses an embodiment of a method for processing cyber threat information according to an embodiment.

[0045] FIG. 11 is a drawing disclosing another embodiment of a cyber threat information processing device according to an embodiment.

[0046] FIG. 12 is a drawing disclosing an embodiment of a cyber threat information processing method according to an embodiment.

[0047] FIG. 13 is a diagram disclosing a quantization technique used by a cyber threat information processing method according to an embodiment.

[0048] FIG. 14 is a diagram disclosing a quantization technique used by a cyber threat information processing method according to an embodiment.

[0049] FIG. 15 is a drawing disclosing an embodiment of a cyber threat information processing method according to an embodiment.

[0050] FIG. 16 discloses another example of a cyber threat information processing device that generates artificial intelligence training data capable of responding to cyber threats.

[0051] FIG. 17 is a drawing disclosing an example of providing a malicious code dataset according to an embodiment.

[0052] FIG. 18 discloses an example of processing cyber threat information that can provide a dataset.

[0053] FIG. 19 is a conceptual diagram for conceptually explaining an embodiment disclosed.

[0054] FIG. 20 is a diagram illustrating the procedure for responding to over-detection in cyber threat information processing using a natural language model according to an embodiment disclosed.

[0055] FIG. 21 is a diagram illustrating the mail processing procedure of a natural language model agent (LLM agent) according to an embodiment.

[0056] FIG. 22 is a diagram illustrating the query analysis procedure of a natural language model agent (LLM agent) included in an embodiment.

[0057] FIG. 23 is a diagram illustrating the query response procedure of a natural language model agent (LLM agent) included in an embodiment.

[0058] FIG. 24 is a diagram illustrating the result of automatically processing a false positive response inquiry of a cyber threat information processing system according to the disclosed example.

[0059] FIG. 25 is a diagram showing an example of a system overdetection response according to an embodiment of a method for processing cyber threat information.

[0060] FIG. 26 illustrates a cyber threat information processing device using a natural language model according to the disclosed example.

[0061] FIG. 27 illustrates a procedure in which a first insight generation unit among the disclosed cyber threat information processing devices generates insight information.

[0062] FIG. 28 discloses an example in which the first insight generation unit exemplified above detects anomalies regarding an attack group's campaign.

[0063] FIG. 29 discloses an example of prompt generation that can generate information using the natural language model exemplified above.

[0064] FIG. 30 illustrates a procedure for generating insight information of the second insight generation unit among the disclosed cyber threat information processing device.

[0065] FIG. 31 discloses another example of prompt generation that can generate information using the natural language model exemplified above.

[0066] Figure 32 illustrates a headline news generated using the natural language model exemplified above.

[0067] FIG. 33 discloses an example of processing cyber threat information in which news can be automatically provided using statistical data insights.

[0068] The best mode for carrying out the invention is disclosed together with the mode for carrying out the invention.

[0069] Hereinafter, embodiments will be described in detail with reference to the attached drawings.

[0070] In the embodiments, the engine, various analysis tools, modules, etc., may be implemented as a physical device, a device combined with the physical device, or software.

[0071] When an embodiment is implemented as software, it may be stored on a non-volatile storage medium executable by a computer and installed on a computer, etc., and executed by a processor.

[0072] Examples of cyber threat information processing devices and cyber threat information processing methods are disclosed in detail as follows.

[0073] In wired and wireless network communication between two or more devices, various types of cyber threat information at different network levels can cause complex abnormal behaviors in said devices simultaneously or at different times. These complex cyber threats and abnormal behaviors are referred to as cyber threat campaigns below.

[0074] In the disclosed embodiment, two or more different types of cyber threat information processing devices may be included. Therefore, for convenience in the embodiment, N cyber threat information processing devices are referred to as the Nth cyber threat intelligence (CTI) device.

[0075] The disclosed embodiment detects and analyzes cyber threat information included in network communication, and based on this, can analyze cyber threat information in more detail through inter-device cooperation, or explain the analyzed results to the user in a very easy way, or enable response or prediction.

[0076] The CTI device of the following embodiment may be implemented as a physical device connected to a wired or wireless communication network, or may be implemented according to the same characteristics and principles in a network-connected device such as an artificial satellite or a spacecraft. Additionally, it may be a directly connected device equipped with a small storage device and connected to a network, such as a network black box or a camera device.

[0077]

[0078] FIG. 1 discloses an embodiment of a method for processing cyber threat information according to an embodiment.

[0079] One embodiment of the disclosed cyber threat information processing method can collect, analyze, and detect data based on communication network traffic, manage cyber threat information from the results, and respond to cyber threats.

[0080] In this embodiment, the first CTI device is assumed to be a device included in a client system that detects and analyzes cyber threat information on network communication, and the second CTI device is exemplified as a device that provides platform-based services based on a computing server and a database in which cyber threat information is analyzed.

[0081] A first CTI device in the client system analyzes data or application data according to a protocol included in network traffic (S110).

[0082] The first CTI device can collect network traffic, classify layered data according to OSI layers, and analyze whether there is cyber threat information based on protocols or applications.

[0083] The first CTI device transmits a query request for cyber threat information related to the analyzed data to the second CTI device (S120).

[0084] The first CTI device can obtain additional detailed cyber threat information by making a query request to the second CTI device regarding the cyber threat information analyzed primarily as above.

[0085] The second CTI device can further analyze cyber threat information based on a query request analyzed by the first CTI device, or generate explanatory information based on artificial intelligence natural language processing regarding the analyzed cyber threat information.

[0086] The client system receives additional analysis results and explanatory information regarding cyber threat information in response to the query request from the second CTI device and provides them to the user (S130).

[0087] The client system can obtain analysis results and explanatory information regarding cyber threat information analyzed by the first CTI device or additionally analyzed by the second CTI device.

[0088] The client system may provide the user with additional analysis results and explanatory information regarding the received cyber threat information. When providing this information to the user, the user may be provided with cyber threat information related to abnormal behavior, malicious behavior, attack behavior, etc., through the monitoring unit of the client system.

[0089] The client system can obtain detailed analysis results and natural language-based explanatory information regarding what the cyber threat information is from the second CTI device.

[0090] Accordingly, users can take response or preventive measures regarding the analyzed cyber threat information.

[0091] Examples of a first CTI device and a second CTI device for collecting network traffic and analyzing cyber threat information are described below.

[0092]

[0093] FIG. 2 discloses another embodiment of a cyber threat information processing method according to an embodiment.

[0094] The second CTI device receives a request for analysis or a request for query regarding cyber threat information related to data included in network traffic from the first CTI device (S210).

[0095] Here, the second CTI device is a natural language model that explains a query for CTI information in natural language and provides the basis for the explanation, or may include a natural language model. Detailed examples of natural language models are disclosed below.

[0096] If the second CTI device is a data platform including a natural language model exemplified below, the second CTI device may receive a request for analysis or a query request for cyber threat information from a client system or from a first CTI device included in the client system via an API.

[0097] The first CTI device can transmit additional analysis information or query requests regarding cyber threat information analyzed directly from network traffic.

[0098] The second CTI device can generate detailed analysis results and explanatory information regarding cyber threat information in accordance with the analysis request or query request (S220).

[0099] The second CTI device can convert files of various formats requested for analysis into binary data or analyze attack activities, attackers, etc., regarding cyber threat campaigns through feature analysis.

[0100] The second CTI device can classify cyber threat information, such as attack behaviors and attackers, or generate detailed analysis results through an artificial intelligence (AI) engine based on the characteristics of the analyzed data.

[0101] The second CTI device can generate natural language description information for a query request of the analyzed cyber threat information based on an internal natural language model.

[0102] The second CTI device can provide the detailed analysis results and explanatory information generated above to a client system or user (S230).

[0103] An example of a second CTI device for cyber threat information through artificial intelligence processing is described below.

[0104]

[0105] FIG. 3 discloses embodiments of a cyber threat information processing device according to an embodiment.

[0106] One embodiment disclosed may include a client system (10) and a second CTI device (2000).

[0107] The client system (10) may include a client device (100) and a first CTI device (1000).

[0108] The client system (10) can receive network traffic from the Internet through the first CTI device (1000).

[0109] The second CTI device (2000) may include a computing server (2800) and a database (2700), a framework (2200) that provides an application programming interface, and an artificial intelligence processing unit (23000).

[0110] Unless the client system (10) and the second CTI device (2000) are connected via an intranet or the like, the second CTI device (2000) may include a separate first CTI device (1010) and receive network traffic from the Internet through it.

[0111] The first CTI device (1000) in the client system (10) can analyze layered protocol data and application data transmitted through the network traffic to obtain metadata related to the protocol and executable or non-executable files within the payload.

[0112] The first CTI device (1000) can analyze the data or files received as described above to detect or extract cyber threat information and provide it to a monitoring system (not shown) connected to the client device (100).

[0113] Alternatively, the first CTI device (1000) may request the second CTI device (2000) to perform additional analysis of the cyber threat information analyzed as above, or request a query related to the cyber threat information.

[0114] Separately, the client device (100) may request the second CTI device (2000) to perform additional analysis on cyber threat information of files or metadata transmitted from an external network through the first CTI device (1000), or request a query related to said cyber threat information.

[0115] The second CTI device (2000) can receive a file, metadata, or cyber threat information (CTI) or a query (hereinafter CTI query) about cyber threat information transmitted by the client device (100) or the first CTI device (1000) through an application programming interface (API).

[0116] Here, the query for cyber threat information may include, for example, whether it is malicious, the hash value of the file, assembly code or information on functions included in the assembly code, and other information related to the file.

[0117] Files received from multiple modules can be analyzed within the framework (2200) of the second CTI device (2000). Here, the multiple modules are simplified and represented as the M module (2210) or the N module (2220).

[0118] For example, the M module (2210) or the N module (2220) of the framework (2200) can perform malicious behavior analysis on network layer metadata, executable files, non-executable files, or web data collected from the internet.

[0119] Meanwhile, the query module (2230) of the framework (2200) transmits CTI queries related to files and metadata transmitted by the client system (10) to the natural language model (2320) of the artificial intelligence processing unit (2300), and CTI feature analysis requests to the AI ​​engine (2310) of the artificial intelligence processing unit (2300).

[0120] The M module (2210) or the N module (2220) can transmit to the query module (2230) analysis information regarding cyber threat information (CTI) or files related to the CTI query queried by the client (100). For example, the M module (2210) or the N module (2220) can transmit to the query module (2230) information regarding whether the analyzed file or data is malicious, attack behavior, attack technique, attack group, or attack campaigns in which multiple attack behaviors are linked.

[0121] The query module (2230) can generate a CTI query for a file, metadata, or related cyber threat information (CTI) submitted by the client (100) or the first CTI device (1000).

[0122] The query module (2230) can generate appropriate CTI queries related to network protocol or application data, files, metadata, and cyber threat information analyzed from the files or data (e.g., information on whether it is malicious, attack behavior, attack technique, attack group, or attack campaign involving multiple attack behaviors) analyzed by the M module (2210) or the N module (2220).

[0123] Then, the natural language model (2320) of the artificial intelligence processing unit (2300) can generate a natural language answer to a CTI query based on network protocol / application data, files, and various cyber threat information (CTI) analyzed by the M module (2210) or the N module (2220), etc.

[0124] The first CTI device (1000) transmits protocol data or application data in network traffic, files in the payload, metadata and related CTI queries to the second CTI device (2000).

[0125] The client device (100) transmits a file transmitted from the first CTI device (1000) and a CTI query related to the file, etc. to the second CTI device (2000).

[0126] The framework (2200) of the second CTI device (2000) can perform cyber threat information (CTI) analysis on protocol data or application data, files, metadata, etc., and extract CTI features.

[0127] The CTI features analyzed and extracted from the framework (2200) generate more accurate CTI analysis results or CTI prediction results in the AI ​​engine (2310) of the artificial intelligence processing unit (2300). The generated CTI analysis results or CTI prediction results are transmitted to the client system (10) through an application programming interface (API).

[0128] A CTI query related to a CTI feature output from the framework (2200) is generated as a result of natural language analysis by the natural language model (2320) of the artificial intelligence processing unit (2300). The natural language query response to the generated CTI query is transmitted to the client system (10) through an application programming interface (API).

[0129] Along with the CTI analysis results generated in this way, the natural language model (2320) provides the natural language CTI query answer generated by the natural language CTI query answer to the client system (10).

[0130] The CTI inquiry response includes a natural language description of whether the cyber threat information (CTI) inquired about is malicious, the attack behavior, the attack technique, the attack group, or the attack campaign in which multiple attack behaviors are linked, in relation to data analyzed and extracted from network traffic by the client system (10), particularly the first CTI device (1000).

[0131] In addition, regarding binary data such as assembly code of a file included in network traffic and functions included in that data, explanation information on whether it is malicious can be provided based on the results analyzed by the second CTI device (2000).

[0132] The framework (2200) of the second CTI device (2000) can provide various analysis information about a file or information that has been analyzed and stored in the database (2700), and can also generate or suggest various additional CTI queries related to the CTI query of the client system (10) to the user.

[0133] And, the natural language model (2320) of the second CTI device (2000) generates natural language descriptions for CTI queries submitted by the client system (10) and additional CTI queries provided by the framework (2200), and provides natural language answers related to the CTI queries to the user of the client system (10).

[0134] In the embodiment, since the second CTI device (2000) analyzes or provides previously analyzed information along with natural language, according to the embodiment, even if the user is a non-expert, easy and accurate information delivery and response to cyber threat information is possible.

[0135] When a server (2800) providing a second CTI device (2000) is connected to the Internet through a separate first CTI device (1010), the function of the first CTI device (1010) is the same as the function of the disclosed first CTI device (1000).

[0136] Embodiments of a second CTI device (2000) including a first CTI device (1000 or 1010) and a natural language model (2320) are disclosed in detail below.

[0137]

[0138] FIG. 4 illustrates a first CTI device as an embodiment of a cyber threat information processing device according to an embodiment.

[0139] An embodiment of the first CTI device (1000) disclosed includes a collection unit (1100) that collects transmission data from mirrored network traffic, an analysis unit (1200) that analyzes data according to a protocol or application within the collected data, and a detection unit (1300) that detects cyber threat information from the analyzed data.

[0140] The data analyzed by the first CTI device (1000) is provided to the user of the client system (10) through the monitoring unit (1800), and the user can monitor abnormal behavior or threat behavior on the network.

[0141] The collection unit (1100) includes a packet collector (1120), and the packet collector (1120) can collect various metadata for security enhancement from the collected packet data without loss.

[0142] By the packet collector (1120) of the collection unit (1100) collecting various metadata from the packets, the analysis unit (1200) can efficiently process multiple packets and pre-allocate the packets in memory so that unnecessary waiting time is not taken.

[0143] Generally, the packet processing speed of an operating system often fails to keep up with the processing speed of a Network Interface Card (NIC). In one embodiment, packet processing performed in the kernel of an operating system can be processed in user space using a high-speed processing library.

[0144] One embodiment of the collection unit (1100) allows a process of the operating system to use a high-speed processing library to poll packets received on a network interface card without using the kernel, thereby reading data at high speed. Accordingly, the embodiment of the collection unit (1100) can reduce the idle time that occurs during the process in which the kernel of the operating system reads packets received on the network interface card and transmits them to the process of the operating system.

[0145] In the disclosed embodiment, the collection unit (1100) may include a network interface card (not shown) and a packet collector (1120).

[0146] A network interface card (not shown) can receive network packets at high speed.

[0147] The packet collector (1120) of the collection unit (1100) reads the received packets by polling without passing through the kernel and transmits them to the processor.

[0148] The receiving core within the packet collector (1120) of the collection unit (1100) can store packets received at high speed via polling in a large-capacity memory. The receiving core of the collection unit (1100) uses a dedicated core of the processor to perform isolated tasks so as to protect against malicious software, etc., included in the high-speed received packets affecting related processes.

[0149] The receiving core of the collection unit (1100) can protect the computer operating system and enhance security functions even when receiving packets at high speed. The memory within the collection unit (1100) can use large-capacity memory that can reduce management overhead, such as memory faults, depending on the management function. The memory of the collection unit (1100) may or may not be set as large-capacity memory depending on the settings of the operating system.

[0150] The copy core of the collection unit (1100) can copy and output packets stored in memory.

[0151] The analysis unit (1200) can inspect, manage, and filter the packets collected by the collection unit (1100) using the Deep Packet Inspection (DPI) method. The DPI method can examine the contents of the data within the collected packets in detail up to the OSI Layer 7 level. Through this, the DPI method can not only identify the overall characteristics of the network data but also control potential and malicious traffic.

[0152] A DPI engine (1210) that performs the DPI method monitors the collected packets between the source and the destination, reassembles them, and inputs them into a separate buffer. The DPI engine (1210) can form a single session with the input packets and generate metadata based on it.

[0153] The packets analyzed by the DPI engine (1210) of the analysis unit (1200) are stored in the queue storage unit (1220) according to their content and then output.

[0154] The queue storage unit (1220) of the analysis unit (1200) can synchronize memory access by completing the calls in fixed time units when data is called simultaneously by multiple threads.

[0155] The analysis unit (1200) can solve problems related to system synchronization by using a multi-threaded environment when the data being analyzed is in various cases, such as metadata, files, or PCAP packet files, by utilizing multiple queue storage units (1220).

[0156] The DPI engine (1210) of the analysis unit (1200) pre-allocates the internal memory (1211) of the analysis unit (1200) to store data and stores the data output by the collection unit (1100).

[0157] The DPI engine (1210) of the analysis unit (1200) can analyze the syntax in detail in real time for the data stored in the internal memory (1211) and extract the file.

[0158] The DPI engine (1210) of the analysis unit (1200) can extract metadata of data across all layers, including layers L2 to L4 as well as the application layer of layer 7. The data extracted by the DPI engine (1210) of the analysis unit (1200) is as follows.

[0159] For example, the DPI engine (1210) of the analysis unit (1200) can obtain not only transport layer information such as the Internet protocol or TCP / UDP of the Source IP and Destination IP from the packet header, but also application layer information within the packet payload.

[0160] The DPI engine (1210) of the analysis unit (1200) can extract metadata of application protocols required for network threat detection, such as HTTP, SSL, SSH, FTP, SMB, DNS, and metadata related to content, such as web pages, filenames, User Agent Strings, JavaScript, and images.

[0161] And the DPI engine (1210) of the analysis unit (1200) can also extract metadata for OT protocols such as industrial application protocols or engineering protocols.

[0162] For example, metadata can also be generated for MODBUS, a network communication encapsulated in the TCP payload; DNP3, widely used in the energy sector; and BACnet and KNX protocols, primarily used in smart buildings.

[0163] The core engine (1212) within the analysis unit (1200) can separate data transmitted according to a layer or industry protocol within a packet according to type and transmit it to the queue storage unit (1220) according to the type of data. The queue storage unit (1220) within the analysis unit (1200) is arranged in parallel according to the data, so that horizontal scalability is possible according to the data type and scale.

[0164] In this way, the analysis unit (1200) can secure visibility into data of all layers according to the packet structure according to the characteristics of the network protocol.

[0165] The analysis unit (1200) can extract data and metadata on the protocols of the IT network and the OT network.

[0166] The analysis unit (1200) can classify data according to the protocol and classify metadata and files accordingly, and can convert data of the same source / destination into a file of a single session PCAP packet and then generate and store metadata according to the PCAP packet.

[0167] The detection unit (1300) can detect threat elements based on metadata added by the analysis unit (1200), files within the packet, and relocated PCAP packet files. The detection unit (1300) can detect event characteristics such as file types, attack behavior types, and OT types as abnormal behavior through profiling.

[0168] For example, the detection unit (1300) can detect malware using at least one of the Indicator of Compromise (IoC), rule-based malware detection tools such as YARA rules, and machine learning.

[0169] The detection unit (1300) can detect abnormal behavior through various rules and AI-based behavior analysis that can identify the attacker's tactics, techniques, and procedures (TTP) according to the attack lifecycle.

[0170] The detection unit (1300) can detect anomalies in an operational technology (OT) environment within a corporate network.

[0171] The detection unit (1300) can detect risk factors by loading extracted event features and applying an AI algorithm to generate a behavior profile, and by calculating a risk score using weights based on the confidence score of the generated behavior profile model.

[0172] For example, the detection unit (1300) can identify threat elements by accumulating scores based on correlation and statistical analysis regarding each threat element of metadata, files, and PCAP packet files, and store them in respective databases (1310, 1320, 1330).

[0173] Through this stepwise and comprehensive method, the detection unit (1300) can reduce the false positive rate and enable the user to efficiently investigate and respond to risk factors.

[0174]

[0175] The detection unit (1300) can identify and detect threat elements from the data analyzed by the protocol data analysis unit (1200) and provide the results to the monitoring unit (1800).

[0176] When the detection unit (1300) identifies a threat element from abnormal events in the metadata extracted or generated as above and the data within the packet's payload, it can perform highly reliable threat element detection by evaluating contextual information regarding whether there is a threat and the threat level based on correlation analysis.

[0177] The detection unit (1300) can perform malware detection, behavior analysis-based detection, and OT anomaly detection from input network traffic.

[0178]

[0179] (a) Malware detection

[0180] The detection unit (1300) can detect malware as known malware by using Indicators of Compromise (IoC) from network traffic. The detection unit (1300) can also detect unknown malware using machine learning techniques.

[0181] The detection unit (1300) can identify attackers and attack activities regarding unknown files by disassembling and converting files within network traffic into binary data and learning malicious files and normal files through machine learning. The detection unit (1300) can detect malicious code using a learning model based on a Random Forest algorithm based on the characteristics of the file's binary data.

[0182] And advanced persistent threats (APTs) can be identified based on defined rules such as YARA rules.

[0183] The detection unit (1300) can classify a predefined signature as malicious code based on a defined rule-based string or a binary pattern (Hex string). The threat detection unit (1350) can identify malicious code by specifying a specific entry point value or by using pattern matching based on regular expressions such as a file offset or a virtual memory address.

[0184]

[0185] (b) Behavior analysis-based detection

[0186] The detection unit (1300) can detect attack tactics, techniques, and procedures (TTPs) based on behavioral analysis of data included in network traffic.

[0187] The detection unit (1300) can detect attack behavior based on behavioral analysis through threat detection according to multiple behavioral rules. The detection unit (1300) applies various AI-based anomaly detection techniques to many features extracted from network traffic. The threat detection unit (1350) can evaluate whether network traffic is anomaly by generating hundreds of profiled anomaly models through entity modeling by device / peer group / network level.

[0188] The detection unit (1300) can evaluate whether there is an anomaly by comparing the extracted features with the device's past patterns (device modeling), evaluating distinctiveness within a cluster (peer group modeling), investigating sparsity throughout the network, and calculating an anomaly score using an anomaly model.

[0189] The detection unit (1300) can detect threat elements in an abnormal model by calculating a threat score for one or more abnormal events through a threat detector.

[0190]

[0191] (c) OT Anomaly Detection

[0192] The detection unit (1300) exists to manage operations, particularly physical operations in various industrial sectors that have benefited from automation and mechanization, in an OT environment designed to maintain safety, uptime, and productivity. The detection unit (1300) can detect threat factors for anomaly detection in an OT environment designed to maintain safety, uptime, and productivity using whitelist-based anomaly detection technology and ML-based anomaly detection technology for process values ​​of time series.

[0193] The detection unit (1300) can detect threat elements based on a whitelist and a time series of sensors.

[0194] When the detection unit (1300) analyzes whitelist-based data, it can detect threats of communication in a malformed format or application misuse by extracting the command field of the protocol included in the data. Additionally, the detection unit (1300) can understand the specialized meaning for each OT protocol, map the detailed message field for each command and the request and response messages into a pair of sessions, analyze them, and select allowed packets based on statistics.

[0195] When the detection unit (1300) detects a time-series-based threat element of the sensor, it can determine whether there is an anomaly by configuring a specific process value extracted from the packet into a time series and comparing it with a model trained by machine learning.

[0196] The detection unit (1300) allows the manager to selectively perform a preliminary test on a model created by machine learning a specific process to verify the accuracy performance.

[0197]

[0198] (d) Correlation analysis

[0199] When the detection unit (1300) detects a threat element by performing malware detection, behavioral analysis-based detection, and OT anomaly detection on the metadata of the packet and the data of the payload, it can identify whether the detected threat element is an actual threat technology through correlation analysis.

[0200] The detection unit (1300) may include multiple threat detectors for correlation analysis. The multiple threat detectors can perform multiple artificial intelligence (AI)-based anomaly detections and identify threat technologies by performing correlation analysis on various contexts using defined rules.

[0201] Meanwhile, an embodiment of the first CTI device may further include an intelligence processing unit (1400).

[0202] The intelligence processing unit (1400) can receive an executable or non-executable file included in the payload of a packet from the analysis unit (1200). The intelligence processing unit (1400) can also receive files, metadata, applications, etc. analyzed from network traffic from the detection unit (1300).

[0203] The intelligence processing unit (1400) can transmit the executable file or non-executable file, or the analyzed metadata thereof, to the cyber threat intelligence system when it intends to detect and identify detailed cyber threat information based on an executable file or a non-executable file.

[0204] One embodiment of the intelligence processing unit (1400) may include analysis modules included in the framework of the second CTI device embodiment disclosed above and an AI engine of the artificial intelligence processing unit. In this case, the intelligence processing unit (1400) may analyze attack behaviors, attackers, campaigns, etc. included in metadata, files, and PCAP packets, etc. detected by the detection unit (1300) and classify them using the AI ​​engine.

[0205] A cyber threat intelligence system can identify attack tactics, techniques, and procedures (TTPs) for a received file and provide profiling results such as the attacker of an Advanced Persistent Threat (APT) and identifiers of attack behaviors (including attack behavior identifiers based on the MITER ATT&CK Matrix).

[0206] Alternatively, an embodiment of the first CTI device may request that the collected and analyzed data be transmitted to the second CTI device for processing in connection with the intelligence profiling processing described above.

[0207] Another embodiment of the intelligence processing unit (1400) includes analysis modules included in the framework of the embodiment of the second CTI device disclosed above and an AI engine of the artificial intelligence processing unit, and may further include a natural language model.

[0208] In such cases, an embodiment of the first CTI device may generate explanatory information of profiled CTI information or CTI query answers in an internal artificial intelligence-based natural language model without the need to query the second CTI device for the data analyzed in relation to the intelligence profiling processing described above.

[0209] When an embodiment of the first CTI device includes a natural language model, an example in which the natural language model generates explanatory information or CTI query answers for CTI information detected by the first CTI device may follow an example of a natural language model disclosed below.

[0210] Although not shown in this drawing, the first CTI device may further include a threat information management unit (not shown) that provides visualization information about threat information for monitoring threat information detected by the detection unit (1300).

[0211]

[0212] Based on this analysis information and natural language model, the threat information management unit (not shown) of the first CTI device can derive risk factors for assets including network-connected IT assets, OT infrastructure, and IoT devices, and produce protection measures and visualization information.

[0213] The threat information management unit (not shown) of the first CTI device provides a means to build and monitor management information of various assets associated with network traffic.

[0214] For example, the threat information management unit (not shown) of the first CTI device can build a list of managed assets and detailed information related to threat information. The first CTI device can build the IP / MAC address, vendor and type information, model serial information, and firmware information of each asset, and monitor the software version.

[0215] The threat information management unit (not shown) of the first CTI device can build a network map of managed assets and provide visualized information through the monitoring unit (1800).

[0216] The threat information management unit (not shown) of the first CTI device can identify vulnerabilities of each managed asset and provide the corresponding vulnerability information through the monitoring unit (1800).

[0217] In an embodiment of this drawing, the first CTI device may include a database for storing data and a processor for processing network data.

[0218] A processor in the first CTI device can process instructions that analyze data or application data according to a protocol within network traffic, transmit a query request for cyber threat information related to the analyzed data to a natural language model, and receive and provide detailed analysis results and explanatory information regarding cyber threat information according to the query request from the natural language model.

[0219] The first CTI device may be implemented as software that stores and executes computer-executable commands as described above.

[0220]

[0221] FIG. 5 discloses an example in which a first CTI device and a second CTI device are interconnected as an example of cyber threat information processing devices according to an embodiment.

[0222] The application programming interface (API) (2100) of the second CTI device (2000) can receive a file, a request for cyber threat information (CTI) analysis related to the file, or a query related to CTI from the client system (10).

[0223] The framework (2200) of the Application Programming Interface (API) (2100) may include multiple analysis modules or prediction modules. For example, the framework (2200) disclosed above may perform static analysis, dynamic analysis, deep analysis, mild-dynamic analysis, etc., according to an input file using an AI engine. Here, any module that performs such analysis or prediction is indicated as the Nth module (1219).

[0224] When the framework (2200) receives a file from the client system (10), it can obtain binary data at the assembly level through disassembly. Based on this, the framework (2200) can perform analysis of functions related to whether they are malicious, analysis of attack behavior or attack techniques, and analysis of attack groups, and analysis of a sequence of binary data blocks (hereinafter referred to as instruction sequences) according to the call relationships of functions included in the binary data.

[0225] The framework (2200) can analyze whether the input file is a non-executable file such as a document file, whether the file is malicious, the attack act or attack technique, and the attack group.

[0226] The server (2800) collects web pages on the internet by performing crawling, whether on-premises server or cloud server, and the framework (2200) can analyze whether the collected web pages are malicious, attack behavior or attack technique, and attack group.

[0227] The database (2700) can classify and store results analyzed by the framework (2200) of the second CTI device (2000), such as assembly code functions that appear during the process of analyzing files, whether the functions are malicious, hash codes, instruction sequences, static analysis, dynamic analysis, mild-dynamic analysis, predictive analysis results, whether the partial tags of web pages are malicious, attack techniques corresponding to MITRE ATT&CK, information about attack behaviors and attack groups, attack campaigns related to files, attack countries, attack industries, etc.

[0228] Meanwhile, the query module (2230) of the framework (2200) transmits the CTI natural language query to the natural language model (2320) of the artificial intelligence processing unit (2300) when the client system (10) makes a request for analysis of cyber threat information (CTI) regarding a specific file, webpage, etc.

[0229] The natural language model (2320) may be a natural language model (NLP), a large language model (LLM), or a language model based on Transformer technology, or it may be a smaller large language model (sLLM) related to cyber threats or security.

[0230] A request for CTI analysis or prediction related to a file of the client system (10) may be made, or a general natural language CTI query unrelated to the file may be requested. Accordingly, the query module (2230) generates a CTI query or supplementary query based on the cyber threat information (CTI) analyzed by the framework (2200) and transmits it to the natural language model (2320).

[0231] If the client system (10) requests a CTI query unrelated to a file, the query module (2230) transmits the CTI query to the natural language model (2320).

[0232] The CTI query language processing unit (2321) can analyze the CTI query using the parsing technique included in the CTI query.

[0233] The CTI query processed by the CTI query language processing unit (2321) is transmitted to the CTI query interpretation unit (2323).

[0234] The CTI query interpretation unit (2323) can perform the function of distinguishing questions based on the sentence structure and meaning of the CTI query processed by the CTI query language processing unit (2321), and recognizing sub-question types and relationships between sub-questions.

[0235] The CTI query interpretation unit (2323) may include a CTI query decomposition unit (2324) and a CTI query analysis unit (2325).

[0236] The CTI query decomposition unit (2324) can perform the function of distinguishing questions based on the sentence structure and meaning included in the CTI query, classifying sub-question types, and recognizing relationships between classified sub-questions.

[0237] The CTI query analysis unit (2325) can classify the types of the separated sub-questions. And the CTI query analysis unit (2325) can recognize the core of the question based on the reliability of the words or phrases that can be replaced by candidate answers, according to the classified types of the sub-questions.

[0238] If the CTI query analysis unit (2325) has a reliability that cannot recognize the core of the question, the CTI query decomposition unit (2324) may be made to reclassify the sub-question types.

[0239] Through the repeated processing of the CTI query decomposition unit (2324) and the CTI query analysis unit (2325) as described above, the CTI query analysis unit (2325) can detect and verify the topic of the CTI-related question.

[0240] The CTI question and answer generation unit (2326) can generate all possible answer candidates from structured or unstructured resources based on CTI questions and question classification information. The CTI question and answer generation unit (2326) may include a CTI answer candidate group generation unit (2327), a CTI answer verification unit (2328), and a CTI answer provision unit (2329).

[0241] The CTI answer candidate generation unit (2327) can perform indexing and search functions from a database (2700) containing cyber threat information (CTI) and generate candidate answers based on the search results. The CTI answer candidate generation unit (2327) generates all possible answer candidates from a database containing cyber threat information (CTI) based on questions and question classification information. Here, the database containing cyber threat information (CTI) includes the database (2700) of the second CTI device (2000). The CTI answer candidate generation unit (2327) may also collect evidence regarding the answer candidates from the database (2700) containing cyber threat information (CTI). This will be described below.

[0242] The CTI Answer Verification Unit (2328) performs the functions of the answer inference and generation module and can determine and generate the best answer. The CTI Answer Verification Unit (2328) determines the ranking of the answer candidates by measuring the reliability of the answer candidates by characterizing the filtered answer candidates and the inferred answer candidates.

[0243] The CTI Answer Verification Unit (2328) can filter answer candidates using inductive, deductive, or abductive reasoning based on the similarity between the query and the answer candidates. The CTI Answer Verification Unit (2328) can then select the optimal CTI answer by re-ranking the answer candidates by comparing the confidence ratio of the answer candidates with a threshold value.

[0244] The CTI answer providing unit (2329) transmits the CTI answer verified by the CTI answer verification unit (2328) to the second CTI device (2000) to provide natural language explanation information for the CTI question answer.

[0245] When a client system (10) queries cyber threat information (CTI) together with or separately from a request for cyber threat information (CTI) related to a file, the second CTI device (2000) may provide information about information related to the CTI file (whether it is malicious, hash value, attack technique, attack group, attack campaign, etc.), a natural language description thereof, and evidence collected as the basis thereof.

[0246] For example, when a client system (10) queries the result of an analysis request for a specific file, information regarding which MITRE ATT&CK attack technique by which attack group the malicious activity caused by the file is connected to, and which attack campaign (a series of mechanisms of one or more attacks) can be provided as visualization information as exemplified above. In addition, the second CTI device (2000) can provide a natural language explanation generated by a natural language model along with the visualization information, and can provide valid digital analysis evidence for the analysis result and natural language explanation analysis evidence for the digital analysis evidence.

[0247] When a client system (10) queries cyber threat information (CTI) without regard to files, it may provide an answer to the CTI query, a natural language description of the CTI query generated by a natural language model, and evidence collected as the basis therefor.

[0248] The second CTI device (2000) can provide the client system (10) with the cyber threat information (CTI) analyzed or predicted by the framework (2200) and the natural language answer or explanatory information for the query of the cyber threat information (CTI) provided by the natural language model (2320).

[0249] The physical device (2000), which is a computing device and is the second CTI device (2000), may include a database (2700) and a server (2800) including a processor.

[0250] A processor driving the second CTI device (2000) can receive a request for cyber threat information (CTI) analysis regarding data related to a file from a client, analyze the requested cyber threat information (CTI), and transmit a first cyber threat information (CTI) query generated based on the analyzed cyber threat information (CTI) to a natural language model (2320).

[0251] And the processor driving the second CTI device (2000) can provide the analyzed cyber threat information (CTI) and the descriptive information of the analyzed cyber threat information (CTI) generated by the natural language model (2320).

[0252] When a processor driving a second CTI device (2000) receives a second cyber threat information (CTI) query from a client, it can transmit the second cyber threat information (CTI) query to a natural language model and provide explanatory information about the cyber threat information (CTI) query generated by the natural language model.

[0253] The operation performed by the above physical device may also be executed by a program that implements the embodiment in software.

[0254] As disclosed in the example, protocol data, application data, files, metadata, and related CTI queries analyzed by the first CTI device (1000) from network traffic, or files and related CTI queries received by the client device (100) are transmitted to the second CTI device (2000).

[0255] The second CTI device (2000) can analyze the CTI features of a file in several modules within the framework (2200) as exemplified. Separately, the client device (100) can also send a CTI query related to a file or file to the second CTI device (2000).

[0256] The CTI features analyzed in the framework (2200) of the second CTI device (2000) are transmitted to the AI ​​engine (2310) of the artificial intelligence processing unit (2300).

[0257] The AI ​​engine (2310) of the artificial intelligence processing unit (2300) can classify additional features regarding the CTI features analyzed in the framework (2200), such as functions of related assembly code, whether the functions are malicious, hash codes, instruction sequences, whether the partial tags of web pages are malicious, attack techniques corresponding to MITRE ATT&CK, information about attack behaviors and attack groups, attack campaigns related to files, attack countries, and attack industries.

[0258] Meanwhile, a CTI query is transmitted to a natural language model (2320) of an artificial intelligence processing unit (2300) to generate descriptive information about CTI features received by the CTI device (2000) or analyzed by multiple modules within the framework (2200).

[0259] The natural language model (2320) of the artificial intelligence processing unit (2300) can generate an answer to the received CTI query and transmit it to the client device (100) or the first CTI device (1000).

[0260] An embodiment of this drawing discloses a case where the first CTI device (1000) does not have a natural language model (2320). When the first CTI device (1000) includes a natural language model (2320), the first CTI device (1000) can generate answers to CTI queries or natural language-based explanatory information using its internal natural language model (2320) and provide them to the user without needing to query the second CTI device (2000) for CTI information detected from network traffic.

[0261]

[0262] FIG. 6 discloses another example in which a first CTI device and a second CTI device are interconnected as an example of cyber threat information processing devices according to an embodiment.

[0263] The application programming interface (API) (2100) of the second CTI device (2000) can receive a file, a request for cyber threat information (CTI) analysis related to the file, or a CTI query related to the CTI from the client system (10).

[0264] The functions of the modules (2220, 2230) within the framework (2200) of the application programming interface (API) (2100) and the crawling function of the server (2800) are as described above.

[0265] The database (2700) can classify and store results analyzed by the framework (2200) of the second CTI device (2000), such as assembly code functions that appear during the process of analyzing files, whether the functions are malicious, hash codes, instruction sequences, static analysis, dynamic analysis, mild-dynamic analysis, predictive analysis results, whether the partial tags of web pages are malicious, attack techniques corresponding to MITRE ATT&CK, information about attack behaviors and attack groups, attack campaigns related to files, attack countries, attack industries, etc.

[0266] Meanwhile, the query module (2230) of the framework (2200) transmits the CTI natural language query to an artificial intelligence-based natural language model (2320) when the client system (10) makes a request for analysis of cyber threat information (CTI). The natural language model (2320) may be a natural language model (NLP) or a large language model (LLM), or it may be a smaller large language model (sLLM) related to cyber threats or security.

[0267] A request for CTI analysis or prediction related to a file of the client system (10) may be made, or a general natural language CTI query unrelated to the file may be requested. Accordingly, the query module (2230) generates a CTI query or supplementary query based on the cyber threat information (CTI) analyzed by the framework (2200) and transmits it to the natural language model (2320).

[0268] If the client system (10) requests a CTI query unrelated to a file, the query module (2230) transmits the CTI query to the natural language model (2320).

[0269] The CTI query language processing unit (2321) can analyze the CTI query using the parsing technique included in the CTI query.

[0270] An example of how the CTI query analysis unit (2325) detects and confirms the topic of a CTI-related question through the iterative processing of the CTI query decomposition unit (2324) and the CTI query analysis unit (2325) was exemplified above.

[0271] The CTI question-answer generation unit (2326) can generate all possible answer candidates from structured or unstructured resources based on CTI questions and question classification information. The CTI question-answer generation unit (2326) may include a CTI answer candidate group generation unit (2327), a CTI answer verification unit (2328), and a CTI answer provision unit (2329).

[0272] The CTI answer candidate generation unit (2327) can perform indexing and search functions from a database (2700) containing cyber threat information (CTI) and generate candidate answers based on the search results. The CTI answer candidate generation unit (2327) generates all possible answer candidates from the database containing cyber threat information (CTI) based on question and question classification information.

[0273] Here, the database containing cyber threat information (CTI) includes the database (2700) of the second CTI device (2000).

[0274] The CTI answer candidate generation unit (2327) may collect evidence for answer candidates from a database (2700) in which cyber threat information (CTI) is stored.

[0275] The CTI answer candidate generation unit (2327) performs indexing and search functions for multiple document files. The CTI answer candidate generation unit (2327) generates candidate answers from an input query using search results from various knowledge databases including the database (2700).

[0276] The CTI answer candidate generation unit (2327) generates all possible answer candidates from various resources, including the database (2700), based on question and question classification information. The CTI answer candidate generation unit (2327) then selects candidate answers based on evidence collected from the resources, using deductive or inductive evidence of the answer type and / or self-evident principles that may constrain the answer. That is, the CTI answer candidate generation unit (2327) can generate answers by verifying answer candidates by collecting evidence for answers from resources including the database (2700) and verifying self-evident principles regarding the context. In this way, the CTI answer candidate generation unit (2327) can search for answers to CTI queries and collect digital evidence or grounds for CTI query answers in the database (2700).

[0277] Since the database (2700) classifies and stores already analyzed cyber threat information (CTI), it can provide search data for generating a candidate group of answers when the CTI answer candidate group generation unit (2327) generates a candidate group of answers. Additionally, the database (2700) can provide evidence or grounds for the answer candidate based on the stored cyber threat information (CTI) when the CTI answer candidate group generation unit (2327) selects an answer candidate from the candidate group of answers.

[0278] The CTI Answer Verification Unit (2328) performs the functions of the answer inference and generation module and can determine and generate the best answer. The CTI Answer Verification Unit (2328) measures the reliability of the answer candidates and determines the ranking of the answer candidates by characterizing the filtered answer candidates and the inferred answer candidates.

[0279] The CTI Answer Verification Unit (2328) can filter answer candidates using inductive, deductive, or abductive reasoning based on the similarity between the query and the answer candidates. The CTI Answer Verification Unit (2328) can then select the optimal CTI answer by re-ranking the answer candidates by comparing the confidence ratio of the answer candidates with a threshold value.

[0280] The CTI answer providing unit (2329) transmits the CTI answer verified by the CTI answer verification unit (2328) to the second CTI device (2000) to provide natural language explanation information for the CTI question answer.

[0281] An example of the second CTI device (2000) providing natural language descriptive information for the requested CTI analysis result and CTI query answer, or providing natural language descriptive information for the CTI query, was disclosed above.

[0282] A physical device (2000), which is a computing device providing a second CTI device (2000), may include a database (2700) and a server (2800) including a processor.

[0283] A second CTI device (2000) can receive a request for cyber threat information (CTI) analysis regarding data related to a file.

[0284] A processor driving the second CTI device (2000) can analyze the requested cyber threat information (CTI) and search the database (2700) for a set of candidate answers for the first CTI query generated based on the analyzed cyber threat information (CTI).

[0285] Based on the above search results, the processor can determine a group of candidates for the above answers and provide a natural language description for the above 1 cyber threat intelligence (CTI) query based on a first candidate (optimal candidate) among the determined group of candidates.

[0286] When a processor driving a second CTI device (2000) receives a second cyber threat information (CTI) query from a client system (10), it can search for a set of candidate answers for the cyber threat information (CTI) query from the cyber threat information (CTI) database. The processor can also provide descriptive information for the cyber threat information (CTI) query generated by the natural language model.

[0287] As in the disclosed embodiment, the second CTI device (2000) can be implemented as a physical device including a database that stores cyber threat information and a computing server that processes data.

[0288] The processor of the computing server can process a set of instructions including instructions for receiving a request for analysis or query regarding cyber threat information related to data included in network traffic from a first cyber threat intelligence (CTI) device, generating detailed analysis results and explanatory information regarding cyber threat information in accordance with said query request, and providing said generated detailed analysis results and explanatory information to a client system.

[0289] It may also be implemented as software that stores executable instructions for a computer that performs the same operations as those performed by a physical device.

[0290]

[0291] FIG. 7 discloses embodiments of a cyber threat information processing device according to an embodiment.

[0292] One embodiment disclosed may include a first CTI device (1000) and a second CTI device (2000).

[0293] The first CTI device (1000) may include a high-speed packet collection engine (1150), a protocol data analysis unit (1250), a threat detection unit (1350), and a threat information management unit (1380). Here, the high-speed packet collection engine (1150) may be described as an example of the collection unit (1100) of the first CTI device (1000), the protocol data analysis unit (1250) may be described as an example of the analysis unit (1200), and the threat detection unit (1350) may be described as an example of the detection unit (1300).

[0294] The high-speed packet collection engine (1150) can collect packet data included in network traffic between a source (SRC) and a destination (DST). In one embodiment, the high-speed packet collection engine (1150) can collect packet data at high speed by polling packets received on a network interface card without using a kernel, using a high-speed processing library.

[0295] The protocol data analysis unit (1250) can analyze packet data and extract flow information for the packet data. In one embodiment, the protocol data analysis unit (1250) can analyze the data included in the packet data according to the protocol or application. In one embodiment, the protocol data analysis unit (1250) can generate metadata corresponding to the protocol or application. In one embodiment, the flow information may include at least one IP and port among the source and destination, a protocol for network traffic, an application, and at least one of the metadata.

[0296] In one embodiment, the flow information may further include at least one of host information and operating system information. Here, the host information may include identification information and version information for a host (e.g., computer, server, device, etc.) including a source and a destination connected to a network. For example, the server may include a server operated by an organization corresponding to the source or destination, for example, a server corresponding to an internal asset.

[0297] In one embodiment, the protocol data analysis unit (1250) can check the status of the port to determine the open ports at the source and destination and the status of the port. Here, the open port can be used for network communication and can be used to determine which application-based service is running.

[0298] In one embodiment, the protocol data analysis unit (1250) can identify which application-based service or protocol is running at the source and destination through the open ports. In one embodiment, the protocol data analysis unit (1250) can determine which server is operating in which version based on flow information.

[0299] In one embodiment, the monitoring targets of the ASM may include internal assets and information about the internal assets, for example, IP and applications when services are operated directly by a server corresponding to the internal assets. In one embodiment, the protocol data analysis unit (1250) may identify vulnerabilities in assets operated by an organization corresponding to at least one of the source and destination, or assets that are not owned by the organization but belong to the organization's infrastructure or supply chain (e.g., cloud). In one embodiment, the monitoring targets of the ASM may include applications when services are operated directly by a server.

[0300] The threat detection unit (1350) can generate vulnerability information corresponding to flow information based on vulnerability-related information included in a predefined vulnerability database. In one embodiment, the threat detection unit (1350) can generate vulnerability information by performing Attack Surface Management (ASM) using flow information based on network traffic collection. According to the present invention, attackable ports or vulnerabilities can be discovered and managed through ASM.

[0301] In one embodiment, the vulnerability information may include at least one of information on whether a vulnerability exists in the flow information, the type of vulnerability, and the content of the vulnerability. For example, the threat detection unit (1350) may determine whether a vulnerability exists in the corresponding port included in the flow information.

[0302] In one embodiment, the threat detection unit (1350) may generate vulnerability information by comparing flow information with an external vulnerability database or a vulnerability database stored in the first CTI device (1000). In one embodiment, the vulnerability database may comply with standards such as CVE (Common Vulnerabilities and Exposures) and may include various vulnerability-related information. In one embodiment, the vulnerability-related information included in the vulnerability database may include a CVE (Common Vulnerabilities and Exposures)-ID (identifier) ​​for the vulnerability type, vulnerability details, and vulnerability severity information. In one embodiment, the vulnerability-related information included in the vulnerability database may include information on various vulnerabilities and risks, including assets that are leaked and exploited.

[0303] The threat information management unit (1380) can input vulnerability information into the natural language model (2320) of the artificial intelligence processing unit (2300) of the second CTI device (2000) to provide vulnerability description information to the user in the form of natural language. In one embodiment, the second CTI device (2000) may be configured separately outside the first CTI device (1000) or may be integrated and configured within the first CTI device (1000). For a detailed description of the second CTI device (2000), refer to the above description.

[0304] In one embodiment, the threat information management unit (1380) may input at least one of vulnerability information and a vulnerability analysis query prompt based on vulnerability information into a natural language model (2320) to provide vulnerability description information to the user in the form of natural language. In one embodiment, the vulnerability analysis query prompt may include content requesting a vulnerability description related to the vulnerability information. In one embodiment, the vulnerability analysis query prompt may include content requesting the creation of an analysis report based on the results of analyzing the vulnerability information and the format of the analysis report. For example, the format of the analysis report may include at least one of an analysis overview, vulnerability status, vulnerability analysis content, source information, destination information, payload analysis content, analysis conclusion, and recommendations for security enhancement.

[0305] In one embodiment, the threat information management unit (1380) may transmit a CTI query including vulnerability information and a vulnerability analysis query prompt to the second CTI device (2000). The CTI query language processing unit of the natural language model (2320) included in the second CTI device (2000) may analyze the vulnerability information and the vulnerability analysis query prompt included in the CTI query using syntactic analysis technology.

[0306] CTI queries processed by the CTI Query Language Processing Unit can be transmitted to the CTI Query Interpretation Unit, which can distinguish questions based on the sentence structure and semantics of the vulnerability analysis query prompts processed by the CTI Query Language Processing Unit and classify the types of the distinguished questions. Additionally, depending on the classified type of the question, the CTI Query Language Processing Unit can identify the core of the question based on the reliability of words or phrases that can be replaced by candidate answers.

[0307] The CTI question-answer generation unit can generate all possible vulnerability answer candidates from structured or unstructured resources based on vulnerability information, vulnerability analysis query prompts, and question classification information. Additionally, the CTI question-answer generation unit can determine and generate the best answer containing vulnerability description information by featuring filtered answer candidates and inferred answer candidates among all vulnerability answer candidates. Additionally, the CTI question-answer generation unit can provide the CTI answer containing the generated vulnerability description information in natural language form to the first CTI device (1000).

[0308] In one embodiment, the threat information management unit (1380) can provide information regarding the vulnerability content and vulnerability countermeasures to the user by visualizing it through the monitoring unit (1800).

[0309] For example, the natural language model (2320) may include various technology-based language models such as natural language processing (NLP), large language model (LLM), and Transformer.

[0310] In one embodiment, when simply providing confirmed vulnerability information, there is a problem in that it is difficult to intuitively determine how the actual vulnerability information affects the vulnerability and how dangerous it is. Therefore, according to the present invention, such vulnerability information is input into a natural language model (2320) so that the user can be provided with a natural language description of how dangerous the vulnerability is and what measures are required.

[0311] In one embodiment, the threat information management unit (1380) can train a natural language model (2320) using vulnerability information and a Question-Answer Instruction generated using the vulnerability information. Here, the Question-Answer Instruction refers to data for the natural language model (2320) to learn, and this data consists of various questions and various natural language answers to those questions based on vulnerability information. In one embodiment, the Question-Answer Instruction may include a dataset for the natural language model (2320) to generate answers to questions.

[0312] In one embodiment, the first CTI device (1000) can use data and applications stored in a separate storage / database under the control of a computing server responsible for data processing. Here, the storage mainly uses a hard disk or SSD to store data, and the database manages structured data and can perform operations such as searching and modifying. At this time, the functions performed by the first CTI device (1000) can be performed by the processor of the computing server.

[0313] According to one embodiment of the present disclosure, vulnerabilities in assets can be identified through the execution of ASM, and security controls can be applied and cybersecurity strategies and policies for the assets can be reinforced. In one embodiment, vulnerabilities in assets can be identified and security controls (e.g., operating system or software patches, etc.) can be applied, security standards for unknown or unmanaged assets can be established and decommissioned, and cybersecurity strategies and policies for the assets can be reinforced, such as removing malicious assets.

[0314]

[0315] FIG. 8 discloses an example of active ASM execution based on network traffic collection according to an embodiment.

[0316] In one embodiment, the first CTI device (1000) can acquire network traffic transmitted by a source and a DST. Additionally, the first CTI device (1000) can generate flow information by performing Depth Packet Inspection (DPI) analysis on packet data included in the acquired network traffic.

[0317] In this case, DPI can deeply analyze packet data in the network to identify protocols or applications related to the packet data or collect metadata. Based on the protocols, applications, or metadata collected through DPI, security vulnerabilities, malicious activity, abnormal traffic, and the like can be detected. For more details, please refer to the above description.

[0318] In one embodiment, flow information may include at least one IP and port among the source and destination, and at least one of the protocol, application, and metadata for the network traffic. Accordingly, the first CTI device (1000) can identify which source and destination use which protocol and application based on which IP and port through the flow information.

[0319] In one embodiment, the first CTI device (1000) can provide vulnerability information corresponding to flow information. That is, it can identify elements with a high probability of external attack and pre-examine and manage vulnerabilities regarding said elements.

[0320] In other words, existing ASMs manually scan ports and IPs by inputting all network IP ranges to check which ports are open and what vulnerabilities exist. However, this approach consumes significant network resources, and incorrect IP range input can lead to scanning third-party networks. Furthermore, scanning may fail if actual ports or services are not open at the time of the scan. Additionally, the scan may not be performed properly due to firewalls or network configurations.

[0321] Accordingly, according to the present disclosure, without the need to perform a network scan, a first CTI device (1000) can receive network traffic and identify flows, protocols, and applications from packet data, and can accumulate and manage what ports each IP has, protocols, and services.

[0322] In one embodiment, when the first handshake between the source and destination—that is, when this session is first established—checks a specific number of the first few bytes, the version of the server (the server providing content services) on the corresponding port serving the application via the relevant protocol can be identified. In one embodiment, the bytes may include bytes of packets transmitted and received during the handshake for communication between the source and destination. In this case, during the handshake process for creating a session, a specific pattern of the bytes can be determined by analyzing the first specific number (e.g., N bytes) of bytes (or packets), and flow information, such as a server or service corresponding to the specific pattern, can be identified. In one embodiment, at least one of server identification information and a server version can be identified by analyzing a specific number of the first few bytes included in a service-specific banner message or Hello message transmitted and received between the source and destination. In this case, information about the counterpart server can be obtained during the handshake process. When such metadata is extracted, it can be compared with a vulnerability database to identify and provide information on which vulnerabilities exist in the actual service servers.

[0323] In this way, according to the present disclosure, even without performing a network scan, the effect of a network scan can be achieved by verifying transmitted and received packet data and using the packet data.

[0324] In one embodiment, after identifying a system or service with a high probability of attack, DPI is performed to deeply analyze traffic to said system or service, thereby detecting and responding to security threats. Additionally, security vulnerabilities in high-probability attack areas identified through ASM can be verified and remedied through actual traffic analysis via DPI. In one embodiment, threats to points identified by ASM can be analyzed and detected by DPI, and explanations regarding points detected after analysis by DPI can be supplemented by ASM.

[0325]

[0326] FIG. 9 discloses an example of providing vulnerability details and measures identified by ASM technology according to an embodiment.

[0327] The natural language model (2320) of this drawing may receive a JSON file containing vulnerability information configured in a text format. Here, the JSON file is a format capable of containing structured data and can be used to represent various information. In one embodiment, the JSON file may include a field containing text, a question, and an object describing additional context.

[0328] In one embodiment, the JSON file may include vulnerability information including a standardized CVE-ID. For example, the JSON file may include at least one of session information, source open ports, service information, operating system, vulnerability ID information (e.g., CVE-ID), destination open ports, service information, operating system, vulnerability ID information, session event information, collected file analysis information, malware match information (e.g., Yara rules), antivirus detection information, and artificial intelligence-based external threat detection information. In the present disclosure, the format of the file input to the natural language model (2320) may be configured in various ways and is not limited.

[0329] In one embodiment, a natural language model (2320) that has a JSON file containing vulnerability information input can output vulnerability description information described in text form. For example, if CVE-2023-44487 vulnerability information for 10.10.4.111 TCP 443 included in flow information is input into the natural language model (2320), a description of the CVE-2023-44487 vulnerability and measures to address the vulnerability can be output in natural language.

[0330] In one embodiment, a predefined question-and-answer command (e.g., input sentence) may be used to train an AI-based natural language model (2320) that outputs vulnerability description information. In one embodiment, the question-and-answer command may include a description and a question regarding a specific vulnerability issue that enables the natural language model (2320) to understand and explain a specific type of vulnerability. In one embodiment, the question-and-answer command may include at least one of a sentence representing an actual vulnerability and a format of an actual vulnerability report. By training the natural language model (2320) using these question-and-answer commands, the natural language model (2320) may recognize various types of vulnerabilities and provide vulnerability description information.

[0331]

[0332] FIG. 10 discloses an embodiment of a method for processing cyber threat information according to an embodiment.

[0333] Packet data included in network traffic between a source and a destination is collected (S310). In one embodiment, packet data can be collected at high speed by polling packets received by a network interface card without using a kernel using a high-speed processing library. For a detailed explanation of this, refer to the details described in FIG. 7.

[0334] Packet data is analyzed to generate vulnerability information corresponding to flow information for said packet data (S320). In one embodiment, the vulnerability information may include a standardized CVE (Common Vulnerabilities and Exposure)-ID (identifier) ​​corresponding to the flow information. For a detailed explanation of this, refer to the details described in FIGS. 7 to 9.

[0335] Vulnerability information is input into a natural language model to provide vulnerability description information (S330). In one embodiment, the vulnerability description information may include at least one of vulnerability content information in natural language form and vulnerability remediation information. In one embodiment, prior to step S330, the natural language model may be trained based on at least one of vulnerability information and a Question-Answer Instruction generated using said vulnerability information. For a detailed explanation of this, refer to the details described in FIGS. 7 to 9.

[0336]

[0337] FIG. 11 is a drawing disclosing another embodiment of a cyber threat information processing device according to an embodiment.

[0338] The cyber threat information processing device (10000) can receive feed data provided by the cyber threat intelligence system in real time.

[0339] To this end, the cyber threat information processing device (10000) may include an Application Programming Interface (API) (1100) and a framework (2200).

[0340] At this time, the cyber threat information processing device (10000) does not require an essential connection with the API (1100) to receive feed data provided by the cyber threat intelligence system in real time. That is, the cyber threat information processing device (10000) can collect feed data by directly connecting to a storage device (e.g., a database (2700)) to receive feed data in real time.

[0341] Additionally, the framework (2200) may include an analysis and prediction module and an artificial intelligence (AI) model (2240). The cyber threat information processing device (10000) may utilize already classified malicious code contained in the database (2700), or pattern codes of stored malicious code, etc. For example, the database (2700) may store sample data, functions, or result information, and may store ASM files and JSON files converted from input files.

[0342] In one embodiment, the database (2700) may store samples of feed data in the form of pre-generated metadata. That is, the cyber threat information processing device (10000) may store feed data provided by the cyber threat intelligence system in the database (2700) as soon as it receives it in real time.

[0343] In one embodiment, the cyber threat information processing device (10000) can collect feed data provided by the cyber threat intelligence system and analyze it through an analysis module to build a dataset.

[0344] More specifically, the analysis module can build a training dataset for an AI model (2240) using feed data collected according to a preset period. For example, the cyber threat information processing device (10000) can collect first feed data from March 1, 2024 to March 31, 2024, and build a training dataset for an AI model using the first feed data on April 1, 2024. At this time, the target AI model corresponds not only to an AI model included within the cyber threat information processing device (10000) but also to an AI model for an external system.

[0345] In one embodiment, the constructed dataset is transferred to a learning module (2241) within an AI model (2240), and the AI ​​model (2240) can learn using the dataset.

[0346] At this time, in order for the AI ​​model (2240) to learn normally, it is necessary to maintain a balance of the number of data per label. For example, training data in which the ratio of normal file data to malicious file data is balanced at 5:5 may be the most ideal.

[0347] Furthermore, training data should ideally include various data types, and redundant data must be eliminated. In this process, the distribution of data types must be even. For example, if there are features A, B, and C, and most of the data consists of features A and B, it implies that the distribution of data types is uneven. This means the training data is skewed toward specific features, which can act as noise during model learning. Therefore, the data must be structured so that each feature is evenly distributed.

[0348] In addition, duplicate data may include not only files with identical hash values ​​but also files with different hash values ​​but identical content composition.

[0349] More specifically, in the case of duplicate data, if duplicate data is distinguished solely by hash values, there is a limitation in that files with identical content structures may be included even if their hash values ​​are not the same.

[0350] Therefore, a sampling method is required to remove duplicate data and ensure an even distribution of the data.

[0351]

[0352] FIG. 12 is a drawing disclosing an embodiment of a cyber threat information processing method according to an embodiment.

[0353] This drawing describes a sampling method for the cyber threat information processing device described above to remove duplicate data and evenly distribute the data.

[0354] In one embodiment, the cyber threat information processing method may primarily use a method to remove sample data having the same hash value in order to remove duplicate data for the purpose of constructing training data. However, in this case, while completely identical samples can be determined as duplicate data and removed, there is a disadvantage in that identical sample data that has been slightly modified to evade detection cannot be removed.

[0355] In addition, in one embodiment, the cyber threat information processing method may use a fuzzy hash-based method for removing sample data to construct training data excluding duplicate data. This corresponds to a method for removing duplicate data by calculating a fuzzy hash based on byte values ​​in binary samples and removing sample data with high similarity. However, in this case, I / O operations for downloading sample data are required, a high amount of computation is required for fuzzy hash calculation, and there is a disadvantage in that it cannot fully reflect the characteristics of the attack campaign information provided by the cyber intelligence system.

[0356] A cyber threat information processing method that improves upon these drawbacks can remove duplicate data from collected feed data through an embedding model and quantization and encoding processes. More specifically, the feed data provided by the cyber threat information processing method from a cyber intelligence system may include at least one of information regarding threat types (e.g., backdoors, ransomware, etc.), information regarding attack techniques used, information regarding attack groups, and information regarding attack industries.

[0357] In one embodiment, the cyber threat information processing method can convert first feed data received from a cyber intelligence system into a first vector representation. More specifically, one feed data (metadata) has information stored in JSON data format.

[0358] In one embodiment, the cyber threat information processing method can convert one JSON data into one vector representation by utilizing an embedding model.

[0359] More specifically, the cyber threat information processing method can convert feed data into a vector representation using an embedding model generated by learning feed data provided by a cyber threat intelligence system. In one embodiment, the embedding model used is characterized by being generated by learning cyber threat information analysis data for a file. In this case, the cyber threat information analysis data for a file learned by the embedding model corresponds to data obtained by analyzing feed data collected by cyber threat intelligence. Additionally, the embedding model used may include a transformer-based model.

[0360] For example, a cyber threat information processing method can utilize an embedding model to convert the first to n JSON metadata into the first to n vectors. Here, the converted vector corresponds to a vector composed of n-dimensional real numbers.

[0361] In one embodiment, a cyber threat information processing method may apply a quantization technique to a transformed vector to quantize it and encode it into a string to extract a signature string. At this time, since the encoded string is generated based on an embedding vector, similar JSON metadata will have similar vectors. That is, vectors corresponding to similar JSON metadata are converted into the same signature string after undergoing the quantization and string encoding processes. For example, a hash value can be extracted from the signature string through vector quantization and string encoding.

[0362] By utilizing the transformed signature string, the cyber threat information processing method can determine that all sample groups having the same signature string are similar and consider them as duplicate data. Therefore, the cyber threat information processing method can select one representative sample data and remove the remaining sample data.

[0363] In one embodiment, the cyber threat information processing method can remove sample data determined to be duplicate data and organize all sample data finally remaining into a dataset.

[0364]

[0365] FIG. 13 is a diagram disclosing a quantization technique used by a cyber threat information processing method according to an embodiment.

[0366] This drawing describes a vector quantization technique used in a sampling method performed by a cyber threat information processing device to remove redundant data.

[0367] In one embodiment, the cyber threat information processing device may perform vector quantization on the transformed vector. In one embodiment, the cyber threat information processing device may perform vector quantization that transforms a vector containing continuous feature values ​​(e.g., real numbers) into a vector containing discrete feature values ​​(e.g., integers). Here, the vector containing discrete feature values ​​on which vector quantization has been performed may represent a representative vector.

[0368] In addition, the cyber threat information processing device can perform encoding that converts the feature values ​​of a vector containing discrete feature values ​​into hash values.

[0369] For example, similar function vectors (e.g., FV1 to FV4) included in a similar set of vectors are vector quantized to be converted into the same representative vector (e.g., RV1), and the same hash value 7ABBB9 can be produced. On the other hand, different function vectors (e.g., FV5) are vector quantized to be converted into a representative vector (e.g., RV2), and the hash value 0AAFF9 can be produced.

[0370] In other words, it can be confirmed that the representative vector of a different vector differs from the representative vector of a similar vector, and consequently, the hash values ​​are also different. That is to say, a cyber threat information processing device can convert similar vectors into the same representative vector through vector quantization. In this case, the same representative vector can have the same hash value.

[0371] In other words, a cyber threat information processing device can quantize a continuous vector into a finite set of representative vectors through vector quantization. That is, the representative vector can refer to the quantized vectors of the functions (i.e., the function to be analyzed and the function to be compared).

[0372] Quantized vectors, also known as codebook vectors or centroids, serve as representative vectors among vectors, enabling a compressed representation of data while expressing a certain amount of information.

[0373] Therefore, the model training / update process is not required, and a search query can be defined to retrieve only functions with the same representative vector. Finally, a hash value can be defined.

[0374] The existing classification model, which tags based on labels and finds candidate functions with the same tag, had high computational complexity due to the large number of candidate groups to compare; however, the method according to the present invention has low computational complexity because it only needs to compare candidate groups of functions that have similar vectors, thereby enabling efficient similarity search.

[0375]

[0376] FIG. 14 is a diagram disclosing a quantization technique used by a cyber threat information processing method according to an embodiment.

[0377] This drawing describes a vector quantization technique used in a sampling method performed by a cyber threat information processing device to remove redundant data.

[0378] In one embodiment, the cyber threat information processing device can perform vector quantization on the vector. Specifically, the cyber threat information processing device can obtain a vector containing continuous feature values ​​(S89110).

[0379] A cyber threat information processing device can perform dimensionality reduction by reducing the dimensionality of a multidimensional dataset composed of continuous feature values ​​included in a vector and reconstructing it into a vector having reduced dimensions (S89120). For example, a 401-dimensional vector can be reduced to 16 dimensions through PCA.

[0380] Additionally, the cyber threat information processing device can perform scaling to extend each feature value of the dimensionally reduced vector to a certain range (S89130). For example, each feature value can be extended to the range [0,15] (16 intervals).

[0381] Additionally, the cyber threat information processing device can perform rounding for each feature value of the vector that has been scaled (S89140). For example, the feature value 1.43697265 can be rounded and converted to 1.

[0382] Additionally, the cyber threat information processing device can convert the feature value of a vector that has been rounded from a continuous type to a discrete type (S89150). For example, the feature value of a real number type can be converted into a feature value of an integer type of 1 byte.

[0383] Additionally, the cyber threat information processing device can convert each feature value of the vector converted into an integer type into a hash value (S89160). In one embodiment, the hash value may be represented as a string. For example, the hash value may include b123f21e12f12123.

[0384] In the experimental data set, 274,572 hash values ​​can be derived from a total of 885,131 functions. Accordingly, a reduction rate of 68.97% can be confirmed.

[0385] In one embodiment, similarity between vectors can be calculated and statistical values ​​extracted in the cluster with the most identical hash values. For example, similarity may include the cosine distance between vectors.

[0386] In one embodiment, referring to , the minimum, maximum, and quartile values ​​are all the same, which may all represent the same function. Accordingly, for example, when a search is performed based on hash values ​​according to vector quantization according to the present invention, it can be confirmed that the number of functions to be compared is reduced from 6,567 to 17 (0.259%).

[0387] Stat.0%25%50%75%100%MeanStd.Value6.66e-166.66e-166.66e-166.66e-166.66e-166.66e-160.0

[0388]

[0389] FIG. 15 is a drawing disclosing an embodiment of a cyber threat information processing method according to an embodiment.

[0390] In one embodiment, the cyber threat information processing method can collect feed data provided by the cyber threat intelligence system (S410).

[0391] In one embodiment, the cyber threat information processing method can analyze the collected feed data to construct a training dataset for an artificial intelligence model (S420). In one embodiment, the cyber threat information processing method can convert the feed data into a vector through an embedding model in order to construct a training dataset. At this time, the feed data includes first to n JSON metadata, wherein n is an integer. In one embodiment, when converting the feed data into a vector through an embedding model, the first to n JSON metadata is converted into first to n vectors.

[0392] In one embodiment, the cyber threat information processing method can convert the converted vector into a signature string through a vector quantization process. At this time, the vector quantization process may include an encoding process that converts the converted vector into a hash value containing discrete feature values ​​obtained by vector quantization.

[0393] In one embodiment, the cyber threat information processing method can remove duplicate data based on the transformed signature string. At this time, all sample groups having the same signature string are determined to be similar, representative sample data is selected from the determined similar sample groups, and the remaining data excluding the representative sample data from the similar sample groups can be removed from the training dataset.

[0394] A cyber threat information processing method can construct a dataset that does not contain duplicate data by vectorizing collected feed data through an embedding model and quantizing it. In existing methods, data is judged to be different if only the hash value is different, so practically, a large amount of duplicate data is inevitably included in the dataset. According to the present invention, there is an advantage in being able to construct a dataset containing high-quality sample data by removing duplicate data using a more rigorous method.

[0395] Below, examples are disclosed that allow for easy acquisition of malware analysis data and the acquisition of specifically required or technically necessary datasets to respond to cyber threats as described above.

[0396] The following embodiments can generate and provide AI training datasets based on Advanced Persistent Threat (APT) intelligence data to respond to various malicious activities. Based on this, customers or companies can strengthen AI-based cyber threat response and provide high-quality data.

[0397] A detailed example is disclosed of generating and refining such datasets to provide reliable datasets to customers.

[0398]

[0399] FIG. 16 discloses another example of a cyber threat information processing device that generates artificial intelligence training data capable of responding to cyber threats.

[0400] The disclosed embodiment illustrates an example of a cyber threat information processing device that can be operated by a database (2700) and a computing server (2800). The computing server (2800) may be a virtualized cloud server or an on-premises server and may include one or more nodes or one or more processors.

[0401] The disclosed example is a platform based on an Application Programming Interface (API) (2100) that can transmit or receive requests related to cyber threat information from at least one client (101, 103, 105).

[0402] Clients (101, 103, 105) can obtain analyzed or predicted results from network intelligence (CTI) devices (102, 104, 106) capable of analyzing cyber threat information transmitted from a network.

[0403] In the following, for components identical to the example above, the description of the embodiment disclosed above may be applied as is.

[0404] Network intelligence (CTI) devices (102, 104, 106) may follow the example of the first CTI device (1000) disclosed above.

[0405] The intelligence platform (CTI) (2201) can process cyber threat information in a platform format and provide the results as another embodiment of the framework (2200) included in the second CTI device (2000) disclosed above. A detailed description of this has been disclosed above.

[0406] For example, an intelligence platform (CTI) (2201) can receive various cyber threat information, files, or queries from a client (101, 103, 105) or a network intelligence (CTI) device (102, 104, 106) through an application programming interface (API) (2100). The intelligence platform (CTI) (2201) can process the received cyber threat information, files, or queries and provide the results to the client (101, 103, 105) or the network intelligence (CTI) device (102, 104, 106).

[0407] In this example, the intelligence platform (CTI) (2201) may include an artificial intelligence (AI)-based information processing module (2202) and several modules capable of analyzing cyber threat information in various ways. Although the intelligence platform (CTI) (2201) has several analysis modules and each module can analyze or predict various cyber threat information, it is referred to here as the Nth module (2220). Since the description of the Nth module (2220) has been disclosed above, a description thereof is omitted.

[0408] The artificial intelligence (AI)-based information processing module (2202) may be the AI ​​processing unit (2300) disclosed above. The natural language model (2320) included in the AI ​​processing unit (2300) disclosed above may be a separate AI agent (2600) separated from the intelligence platform (CTI) (2201) in this drawing. That is, the natural language model (2320) disclosed above may be the AI ​​agent (2600) in this drawing.

[0409] The prompt hub (2350) includes a dataset of various queries or optimized questions and answers optimized for processing cyber threat information, and can use this to provide various explanations of the results of processing cyber threat information to the AI ​​agent (2600) to the clients (101, 103, 105).

[0410] For example, the prompt hub (2350) can generate a set of natural language questions and answers based on the results of cyber threat information analyzed, generated, or processed by the post-processing framework (2500) or intelligence platform (CTI) (2201) described below.

[0411] The prompt hub (2350) may include the functions of the query module (2230) disclosed above. In this drawing, the prompt hub (2350) within the second CTI device (2000) is depicted as a separate module from the intelligence platform (CTI) (2201), but the intelligence platform (CTI) (2201) may include the prompt hub (2350) as the query module (2230) disclosed above.

[0412] The post-processing framework (2500) can post-process cyber threat information analyzed or generated by the prompt hub (2350), AI agent (2600), and intelligence platform (CTI) (2201) and provide it to clients (101, 103, 105).

[0413] In this example, the post-processing framework (2500) may include several modules (2510, 2520) according to the post-processing method and processing function of cyber threat information, and are represented in this drawing as the first module (2510) to the Kth module (2520).

[0414] For example, the post-processing framework (2500) can generate various cyber threat information analyzed by the intelligence platform (CTI) (2201), such as visualization information on specific malicious behavior or advanced persistent threats (APT), and provide it through the intelligence platform (CTI) (2201).

[0415] Information related to the query of the post-processing framework (2500) is transmitted to the prompt hub (2350), and the prompt hub (2350) can generate a query for cyber threat information using the generated question-and-answer database and transmit it to the AI ​​agent (2600).

[0416] In this embodiment, the post-processing framework (2500) is described as a separate framework from the intelligence platform (CTI) (2201), but multiple modules within a single framework may be configured to perform the respective described modules.

[0417] That is, the post-processing framework (2500) may be composed of the intelligence platform (CTI) (2201) and the framework (2200) disclosed above as a single framework, or it may be divided into several separate frameworks depending on the function of the module.

[0418] In this example, for convenience, the intelligence platform (CTI) (2201) is represented as a platform for detecting malware or malicious behavior, and the post-processing framework (2500) is represented as a set of modules that generate a dataset by applying metadata to the results detected by the intelligence platform (CTI) (2201) or provide the detected results to a client after performing certain post-processing.

[0419] Accordingly, as in the example disclosed above, the post-processing framework (2500) may include a module that performs vector quantization used in a sampling method for removing duplicate data and making the data distribution uniform among the data processed by the intelligence platform (CTI) (2201). A detailed embodiment thereof has been disclosed above.

[0420] The AI ​​agent (2600) can provide a natural language description along with relevant information to the client (101, 103, 105) directly or through the intelligence platform (CTI) (2201) in response to a cyber threat information query received from the prompt hub (2350).

[0421] Alternatively, the AI ​​agent (2600) may provide this information or explanation back to the post-processing framework (2500) so that the post-processing framework (2500) can regenerate the relevant information.

[0422] Meanwhile, the intelligence platform (CTI) (2201) can perform malware or malicious behavior analysis based on at least one of various files, queries, hash values ​​of files, or metadata of cyber threat information received from clients (101, 103, 105) or network intelligence (CTI) devices (102, 104, 106) or collected by crawling itself.

[0423] The intelligence platform (CTI) (2201) can generate various metadata related to the analyzed malware or detected malicious activity.

[0424] The metadata of the analyzed cyber threat information may include the date the analysis was completed or updated, the date the malicious activity was collected, the date the malicious activity was first detected, the date the malicious activity was last detected, and the hash values ​​of various series of associated hash functions.

[0425] In addition, the metadata of the analyzed cyber threat information may include the size of the file associated with the malicious activity, the file type, and tag information associated with the malicious activity.

[0426] The metadata of the analyzed cyber threat information may include the detection name of the malicious activity, the file name, the type of threat, an attack identifier related to the attack technique, a name assigned to the attack activity, a tactic related to the attack activity, and site information that can explain it.

[0427] And, if cyber threat information is analyzed from a specific file, cyber threat information of files similar to the specific file and campaign-related Indicators of Compromise, that is, indicators of compromise that may be common to the cyber threat incident, may be included in the cyber threat information.

[0428] Regarding the cyber threat information analyzed by the intelligence platform (CTI) (2201) and the metadata generated, the post-processing framework (2500) may store the generated dataset in the database (2700) together with the metadata or by adding labeling to the metadata. Depending on the collected cyber threat information, normal data may be stored, or known malicious behavior data or new malicious data may be stored.

[0429] An example has been disclosed in which the framework (2200) or post-processing framework (2500) disclosed above receives data analyzed by the intelligence platform (CTI) (2201) as feeding data to build an artificial intelligence dataset.

[0430] The intelligence platform (CTI) (2201) transmits the client's request or the detection of the network intelligence (CTI) device (102, 104, 106) or its own detection results to the post-processing framework (2500).

[0431] The first module (2510) of the post-processing framework (2500) can label the transmitted detection result dataset, store it in the database (2700), and provide it when requested by a client.

[0432] For example, when the first module (2510) of the post-processing framework (2500) receives a condition including metadata of cyber threat information from the first client (101), it can extract a dataset matching the condition from the database and provide it to the first client (101).

[0433] As another example, the first module (2510) of the post-processing framework (2500) can automatically provide the dataset to an intelligence platform (CTI) (2201) or other websites according to certain conditions or labeling, regardless of the user.

[0434] A processor in the computing server (2800) can store a dataset labeled according to metadata related to cyber threat information of the input data in the database (2700).

[0435] A processor in the computing server (2800) can generate a dataset corresponding to metadata or query information requested by a client from a dataset stored in the database (2700).

[0436] A processor in the computing server (2800) can provide the stored or generated dataset to the client according to the purchase method selected by the client.

[0437]

[0438] FIG. 17 is a drawing disclosing an example of providing a malicious code dataset according to an embodiment.

[0439] The post-processing framework (2500) of the second CTI device (2000) can automatically generate and store a dataset related to cyber threat information for a specific period, for example, every month, and provide it to a client at a specific time. For example, the post-processing framework (2500) can generate and store a dataset on the 1st of every month, according to a set label, of new malicious or normal data collected during the previous month.

[0440] The post-processing framework (2500) can provide clients with a dataset that is automatically labeled and stored according to a specific period on an interface or website. For example, the post-processing framework (2500) can automatically store a dataset labeled with relevant metadata for data detected by an intelligence platform (CTI) on a monthly basis.

[0441] For example, the above metadata may include at least one of the following: the date the analysis of the detected data was completed or updated, the date the malicious activity of the input and detected data was collected, the date the malicious activity of the input and detected data was first detected, the date the malicious activity of the input data was last detected, and the hash value of a hash function associated with the input data.

[0442] Additionally, the metadata may include at least one of the following: the size of a file related to the malicious activity of the input data, the type of the file, tag information related to the malicious activity, the detection name of the malicious activity, the name of the file, the type of threat of the malicious activity, an attack identifier related to the attack technique of the malicious activity, a name assigned to the attack activity of the malicious activity, a tactic related to the attack activity of the malicious activity, and site information describing the attack technique.

[0443] It may include at least one of the date the analysis of the input data was completed or updated, the date the malicious activity of the input data was collected, the date the malicious activity of the input data was first detected, the date the malicious activity of the input data was last detected, and the hash value of the hash function associated with the input data.

[0444] Datasets based on labeling may include normal datasets or malicious datasets.

[0445] For example, the post-processing framework (2500) can automatically generate metadata of malicious files and normal files that were first collected in the previous month. And the post-processing framework (2500) can provide a file to a client, such as a specific filename (e.g., YYYYMM.tar.gz), through a website, etc.

[0446] The post-processing framework (2500) can provide list information of automatically generated datasets when the client (101) connects. The post-processing framework (2500) may allow the client (101) to download the relevant dataset when the client (101) purchases the selected dataset, or send an email containing a link to download the selected dataset to the client's (101) registered email address. If the dataset is large, it may be divided into multiple compressed files and provided.

[0447] Meanwhile, the client (101) can obtain a dataset labeled and stored in the database (2700) by the post-processing framework (2500) according to desired conditions.

[0448]

[0449] In this diagram, the dataset in the first column provides metadata related to the generated dataset. That is, for the dataset in the first column, metadata such as related labels (tag), collection period (Period: From / To), collection date or threat observation date (Date (or Seen)), hash value (Hash), related file information (file info), threat type, and attack tactics, techniques, and methods (Attack TTP) can be provided.

[0450] The client can view the metadata of the dataset built by the post-processing framework (2500) and purchase or subscribe to the relevant dataset to use it.

[0451] The dataset in the second column is an interface that allows the client to obtain the dataset by inputting metadata about the dataset required to the post-processing framework (2500).

[0452] The client can obtain related datasets, such as malicious datasets, by purchasing or subscribing to them by inputting metadata related to cyber threat information of the desired dataset. That is, the embodiment can generate a customized dataset corresponding to the combination of metadata selected or input by the client and provide it to the client.

[0453] The datasets in the third and fourth columns of this drawing disclose an example of providing datasets according to the conditions of specific client queries.

[0454] Clients can search for data types and construct queries. For example, keywords included in the query can contain core keywords of the metadata.

[0455] Metadata may include at least one of the date the analysis of the input data was completed or updated, the date the malicious activity of the input data was collected, the date the malicious activity of the input data was first detected, the date the malicious activity of the input data was last detected, and the hash value of a hash function associated with the input data.

[0456] As another example, metadata may include at least one of the following: the size of a file related to the malicious activity of the input data, the type of the file, tag information related to the malicious activity, the detection name of the detected activity, the name of the file, the type of threat of the malicious activity, an attack identifier related to the attack technique of the malicious activity, a name assigned to the attack activity of the malicious activity, a tactic related to the attack activity of the malicious activity, and site information describing the attack technique.

[0457] For example, the dataset in the fourth column indicates that it was generated through a query of keyword combinations of metadata such as "file_type : exe* OR file_type : pdf AND victim_country : KR".

[0458] In this way, when a client inputs a combination of metadata or a query of a certain format, the post-processing framework (2500) can provide a dataset corresponding to the input metadata combination or query as a search result using the labeling stored in the database. When the client selects a dataset provided according to the search result, the post-processing framework (2500) can provide a dataset corresponding to the metadata to the client.

[0459] In the example of this diagram, purchasing and subscription were illustrated regarding how a client obtains a dataset.

[0460] When a client purchases a dataset, the post-processing framework (2500) may allow the client to download the relevant dataset a set number of times.

[0461] When a client subscribes to a dataset, the post-processing framework (2500) can generate a dataset that matches the query requested by the client and provide it to the user via email or the like at a specific time.

[0462]

[0463] FIG. 18 discloses an example of processing cyber threat information that can provide a dataset.

[0464] The post-processing framework (2500) of the intelligence system stores a dataset labeled according to metadata related to cyber threat information of the input data in the database (2700) (S510).

[0465] Examples of generating a dataset are illustrated in FIGS. 11 to 17.

[0466] The post-processing framework (2500) can generate a dataset corresponding to metadata or query information requested by the client from a dataset stored in the database (2700) (S530).

[0467] If the dataset requested by the client does not match the stored dataset, the requested dataset can be generated from the stored dataset using the metadata or query information requested by the client. An example of this is disclosed in FIGS. 16 and 17. This step may be omitted if the dataset requested by the client is already stored in the database.

[0468] The post-processing framework (2500) provides the client with the stored or generated dataset according to the purchase method selected by the client (S550). A detailed example of this is disclosed in FIG. 17.

[0469] According to the embodiment, malware analysis data can be easily obtained to respond to cyber threats as described above, and specific or technically necessary datasets can be acquired.

[0470] According to the embodiment, an AI training dataset can be generated and provided based on Advanced Persistent Threat (APT) intelligence data to respond to various malicious activities. Based on this, customers or companies can strengthen AI-based cyber threat response and provide high-quality data.

[0471] According to the embodiment, by generating and refining such a dataset, a reliable dataset can be provided to customers.

[0472]

[0473] FIG. 19 is a conceptual diagram for conceptually explaining an embodiment disclosed.

[0474] If there is a false positive result in the detection results of the cyber threat information processing system, an inquiry email regarding this may be received from the user.

[0475] Generally, the current processing of cyber threat information processing systems involves an administrator analyzing the inquiry email, and if the request included in the email concerns the handling of a system false positive, taking measures to restore the related detection results to normal (indicated as As Is).

[0476] However, as with the problem described above, these measures are not efficient and leave room for errors due to manual work by managers.

[0477] Accordingly, the following discloses examples of how to process user inquiry emails by analyzing them through a natural language-based AI agent and automatically perform measures against false detection (indicated as To Be).

[0478]

[0479] FIG. 20 is a diagram illustrating the procedure for responding to over-detection in cyber threat information processing using a natural language model according to an embodiment disclosed.

[0480] The intelligence platform (CTI) (2201) can receive executable or non-executable files from a client and analyze and detect whether they contain cyber threat information. A detailed description of this was disclosed in the previous embodiment.

[0481] The intelligence platform (CTI) (2201) may detect cyber threat information based on files or queries received from the client, but as described above, it may also detect cyber threat information based on information received from various devices installed on the client-side device.

[0482] In the event of a false positive occurring during the detection process described above, the client may be unable to process specific emails, files, or information because they are filtered. In this case, the client may request confirmation regarding the false positive processing or request that the file or information be legitimate through the system's mail system.

[0483] Mail storage (2212) can store mail containing a request for correction regarding the over-detection of the intelligence platform (CTI) (2201).

[0484] The mail processing unit (2610) of the natural language model agent (LLM agent) or AI agent (2600) according to the embodiment loads a mail containing a request for correction regarding a false positive processing stored in the mail storage (2212). The mail processing unit (2610) can filter and parse the loaded mails to parse the request details within the mail.

[0485] The query analysis unit (2630) of the AI ​​agent (2600) receives the request parsed by the mail processing unit (2610), combines the system prompt received from the prompt hub (2350) with the request to query the natural language model and perform natural language analysis on the request.

[0486] The query analysis unit (2630) can extract necessary data from the analysis results and verify the content of the request. The query analysis unit (2630) can also generate the content of the request into a file of a specific format, such as a JSON file.

[0487] The query response unit (2650) can verify whether the data is over-detected based on the data analyzed by the query analysis unit (2630) using a natural language model. In this case, the query analysis unit (2630) can classify cases where it has incorrectly detected data and transmit them back to the intelligence platform (2201). Therefore, the intelligence platform (2201) can subsequently detect the incorrectly detected data as normal by reflecting the analysis results of the query analysis unit (2630).

[0488] And the query analysis unit (2630) can provide the result report to the user in various apps or tools.

[0489] Detailed examples of each component are disclosed below with reference to the drawings.

[0490]

[0491] FIG. 21 is a diagram illustrating the mail processing procedure of a natural language model agent (LLM agent) according to an embodiment.

[0492] The mail processing unit (2610) of the natural language model agent (LLM agent) (2600) includes a mail loading unit (2611), a mail filtering unit (2613), and a mail parsing unit (2615).

[0493] The mail loading unit (2611) loads mail stored in the mail storage (2212) of the intelligence platform.

[0494] The mail filtering unit (2613) can filter mail from a specific user among the stored mail. For example, the mail filtering unit (2613) can filter mail included in a whitelist based on mail sender information. The whitelist used by the mail filtering unit (2613) is adjustable, and can be used to filter a minimum number of mails to identify mail requests required for over-detection processing.

[0495] The mail parsing unit (2615) parses the mail filtering unit (2613) that has filtered the mail. For example, the mail parsing unit (2615) can parse the subject, body, etc. included in the mail and remove parts that are unnecessary for request processing.

[0496] The mail data refined as a result of parsing by the mail parsing unit (2615) can be used to generate a query for the AI ​​agent later.

[0497] In this way, the mail processing unit (2610) of the AI ​​agent (LLM agent) (2600) can selectively filter only the mail necessary for responding to the detection and extract only the relevant requests from the user.

[0498]

[0499] FIG. 22 is a diagram illustrating the query analysis procedure of a natural language model agent (LLM agent) included in an embodiment.

[0500] The query analysis unit (2630) of the natural language model agent (LLM agent) (2600) includes a system prompt (2631), a query parsing unit (2633), a model result generation unit (2635), and an analysis result generation unit (2637).

[0501] The query analysis unit (2630) generates a query in a natural language model regarding the content of the email processed by the email processing unit (2610), and inquires with the natural language model using the generated query to generate a natural language model result.

[0502] Specifically, the system prompt (2631) can generate an optimal query prompt for the natural language model by combining the system prompt (2631) that processes prompts generated by the prompt hub (2350) and the mail data provided by the mail processing unit (2610).

[0503] The system prompt (2631) can generate rules, guidelines, or context information related to a query that can respond to a super-detection. The prompts that form the basis of the query for responding to a super-detection generated by the system prompt (2631) and the requests for super-detection within the email data received by the query analysis unit (2633) can generate input data for responding to a super-detection.

[0504] That is, the query parsing unit (2633) combines mail data transmitted in a specific file format such as JSON and a prompt generated by the system prompt (2631) to generate input data suitable for processing by the natural language model.

[0505] The prompt of the system prompt (2631) provides a prompt guide so that the natural language model can understand the purpose of the query and accurately return the necessary information.

[0506] The model result generation unit (2635) generates natural language processing results for the query prompt generated by the system prompt (2631) and the query analysis unit (2633). The model result generation unit (2635) may be a natural language model or agents having functions connected to a natural language model.

[0507] Accordingly, the model result generation unit (2635) can identify the key keywords and the core of the request included in the mail data and provide the natural language result. The natural language result may include inquiries regarding cyber threat information related to the key keywords or request included in the mail data.

[0508] The analysis result generation unit (2637) can provide a result in which processing results for cyber threat information are added to the natural language result generated by the model result generation unit (2635). In this example, an example is disclosed in which the natural language processing result of mail data is returned in a JSON file format.

[0509] For example, the result data provided by the model result generation unit (2635) may include whether specific link information (URL) is included in the mail data and status information (no status, normal, or malicious) of cyber threat information related to the link information (URL).

[0510] If the email data contains link information (URL), it can indicate whether the link is legitimate or malicious. This can also be used to determine the priority of user requests.

[0511] In this way, the analysis result generation unit (2637) can generate an analysis result for the cyber threat information included in the email.

[0512]

[0513] FIG. 23 is a diagram illustrating the query response procedure of a natural language model agent (LLM agent) included in an embodiment.

[0514] The query response unit (2650) of the natural language model agent (LLM agent) (2600) includes a mail filtering unit (2651), a detection filtering unit (2652), a malicious threat filtering unit (2658), and a response reporting unit (2659).

[0515] The mail filtering unit (2651) of the query response unit (2650) can filter emails related to the over-detection response requests analyzed by the query analysis unit (2630) at regular intervals to filter out necessary request emails. For example, the mail filtering unit (2651) can select emails at regular intervals by considering the natural language results analyzed by the query analysis unit (2630), link information included in the email, or cyber threat information detected in relation to the link information.

[0516] The detection filtering unit (2652) can check whether there is an over-detection (over-detection of cyber threat information), a false detection (incorrect detection of cyber threat information), or a failure to detect (detection of cyber threat information) regarding the link information included in the selected emails or the cyber threat information detected in relation to the link information. Based on the result of the check, the detection filtering unit (2652) can classify and output 1) emails that do not have files and links within the email, 2) emails that have files or links within the email but do not have malicious cyber threat information (Benign), and 3) emails that have files or links within the email and do not have malicious information in the files or links (Malicious).

[0517] The malicious threat filtering unit (2658) can provide the intelligence platform (2201) with emails containing normal files or links (URLs) that do not contain threat information, or emails containing cyber threat information, among the emails classified by the detection filtering unit (2652). Then, the intelligence platform (2201) can take measures to correct the incorrectly detected results in the future. That is, the intelligence platform (2201) reflects the incorrectly detected results so that normal detection can be performed in the future.

[0518] The response reporting unit (2659) can transmit the processing results for emails classified by the malicious threat filtering unit (2658) to an administrator or user. This may include the number of emails processed, the number of incorrectly detected information, and the analysis content included in the information.

[0519]

[0520] FIG. 24 is a diagram illustrating the result of automatically processing a false positive response inquiry of a cyber threat information processing system according to the disclosed example.

[0521] This diagram shows, according to the example disclosed, the number of request emails related to detection results, the number of emails corresponding to the natural language model, the number of undetected links (URLs) among the emails corresponding to the natural language model, the number of overdetected links (URLs) among the emails corresponding to the natural language model, and link (URL) information of the request emails processed by the natural language model.

[0522] According to the example disclosed above, a client's request can be delivered via email to the detection results of an intelligence platform, which is a cyber threat information processing device.

[0523] Mail count indicates that there are 2 related mails processed by the mail processing unit (2610). That is, it indicates the number of mails transmitted to the mail parsing unit (2615) through the mail loading unit (2611) and mail filtering unit (2613) of the mail processing unit (2610).

[0524] The number of emails corresponding to the natural language model (success gpt response generate count) represents the number of emails automatically corresponding to these request emails using the natural language model, indicating that the number of emails parsed by the query parsing unit (2633) is 2. That is, it means the number of emails delivered to the model result generation unit (2635) among the emails parsed by the system prompt (2631) and the query parsing unit (2633).

[0525] False negative (malware->normal) response among emails responded to by a natural language model indicates the number of links (URLs) that failed to detect cyber threat information as a result of verification, among the cases where the natural language model responded to the above request email.

[0526] In other words, this refers to a case where the intelligence platform detected an email containing malware, but the verification result identified it as a link that does not contain threat information.

[0527] For example, if an email contains one or more URLs, the natural language model can identify the number of URLs within the email and determine whether they are malicious or legitimate based on their contextual structure. Here, the number of missed links refers to the number of URLs that were detected as malware but are actually legitimate.

[0528] The number of False Positive (normal->malware) responses among emails responded to by a natural language model represents the number of links (URLs) included in emails where cyber threat information was detected among cases where the above request emails were responded to by a natural language model.

[0529] In other words, this case indicates that there are 3 links that the intelligence platform detected as normal data in the email but were verified as being related to malware.

[0530] In the example, the number of emails is 2, but it can be indicated that there are 3 URLs inside the emails.

[0531] Also, the link (URL) information of the request email processed by the natural language model indicates detailed information about the links included in the email requested above.

[0532]

[0533] FIG. 25 is a diagram showing an example of a system overdetection response according to an embodiment of a method for processing cyber threat information.

[0534] A processor of a server included in a cyber threat information processing system receives a request for processing threat information of the cyber threat information processing system (S610).

[0535] The processor of a server included in a cyber threat information processing system can load requests stored in storage devices, such as mail storage, at regular intervals.

[0536] The received request may be received via email or similar means related to a false positive or false negative of the cyber threat information processing system. An example of receiving a request for the processing of threat information is disclosed in detail in FIGS. 20 and FIGS. 21.

[0537] The server's processor obtains a file or link related to the received request and uses a prompt to generate a natural language processing result included in the received request (S630).

[0538] Examples of natural language processing and analysis of related files or links in relation to a request are illustrated in FIGS. 20 and FIGS. 22.

[0539] The server processor verifies the detection result of the file or link corresponding to the above request and reflects the result corresponding to the above request in the cyber threat information system (S650).

[0540] Examples of modifying the detection results of cyber threat information regarding a request, reflecting them in the cyber threat information system, and reporting the results to users and administrators are illustrated in FIGS. 20, 23, and 24.

[0541] According to the disclosed example, the occurrence of errors can be reduced while efficiently and quickly processing false positive results of a cyber threat information processing system.

[0542]

[0543] FIG. 26 illustrates a cyber threat information processing device using a natural language model according to the disclosed example.

[0544] Here, examples of cyber threat information processing devices may include the intelligence platform (2201), post-processing framework (2500), and AI agent (not shown) exemplified above.

[0545] The intelligence platform (2201) can generate various types of cyber threat information using various executable or non-executable files, data on the internet or files entered by a user, hash values ​​or queries.

[0546] The post-processing framework (2500) includes various data processing modules, and here discloses an example including a statistical insight module (3000).

[0547] The statistical insight module (3000) includes a statistical data collection unit (3100) and a threat insight generation unit (3200), and the threat insight generation unit (3200) may again have insight generation units (3210, 3220) that generate various insight information.

[0548] The statistical data collection unit (3100) collects or extracts various types of cyber threat information provided by the intelligence platform (2201), for example, can generate or extract the number of campaigns (APT) by attack group and the frequency of indicators of compromise (IoC) used in attacks. Here, an attack campaign refers to a set of consecutive cyber attacks.

[0549] The statistical data collection unit (3100) can generate or extract statistical data such as the frequency of specific IPs used in attacks, the frequency of specific domains, the frequency of specific links (URLs), and the frequency of hash values.

[0550] The threat insight generation unit (3200) can generate insight information related to cyber threat information.

[0551] The first insight generation unit (3210) can, for example, generate insights related to attack groups among the statistical data collected by the statistical data collection unit (3100).

[0552] The second insight generation unit (3210) can generate insights related to statistical data of infringement indicators among the statistical data collected by the statistical data collection unit (3100).

[0553] Here, examples of generating insights related to attack groups or indicators of compromise are disclosed, but insights can also be provided by processing statistical data from other cyber threat information.

[0554]

[0555] FIG. 27 illustrates a procedure in which a first insight generation unit among the disclosed cyber threat information processing devices generates insight information.

[0556] The first insight generation unit (3210) can generate data that can provide insights into the characteristics of an attack group or attack behaviors from statistical data related to an attack group.

[0557] The first insight generation unit (3210) may include an anomaly detection module (3211) and a first statistics module (3213). For example, the anomaly detection module (3211) can find anomaly signs in the statistics data regarding the attack group's campaign among the statistics data collected by the statistics data collection unit (3100).

[0558] Various data related to the attack group's campaign, such as the attack group name, entry path information, target country information, target industry information, etc., may be included, and the anomaly detection module (3211) can set this as an anomaly if there is a change in any of the information related to the attack group's campaign or if there is an anomaly such as an amount in specific information.

[0559] The first statistical module (3213) generates statistical data that can provide insights into abnormal data related to an attack group's campaigns captured by the anomaly detection module (3211). For example, the first statistical module (3213) can generate statistical data regarding data on affected industries, data on affected countries, data on threat classification, data on attack techniques, etc., related to the attack group's campaigns.

[0560] Statistical data regarding these attack group campaigns can be generated based on the attack group and can be generated for various threat information included in the attack group campaign.

[0561] The prompt hub (2350) receives statistical data included in the campaigns of attack groups generated by the first statistics module (3213) and generates a prompt that can generate news for each attack group based on this.

[0562] For example, the prompt hub (2350) can generate rules, instructions, or context information related to queries for each campaign from statistical data included in the campaigns of attack groups. By combining the rules, instructions, or context information related to these queries with the content of the statistical data for each campaign, the AI ​​agent (2350), which is a natural language model, generates input data that is easy for the agent to process. A detailed example of this will be described later.

[0563] In this way, the prompt hub (2350) can generate a prompt that allows the AI ​​agent (2350) to generate news about attack groups or campaigns of attack groups.

[0564] The AI ​​agent (2350) can perform a natural language model using the prompts of attack groups or campaigns generated by the prompt hub (2350) and generate natural language news information that explains the relevant information in natural language.

[0565] Then, even if the user cannot grasp or interpret detailed information about the attack groups' campaigns or changes in data, they can obtain insights by attack group or campaign from the natural language information generated by the AI ​​agent (2350).

[0566]

[0567] FIG. 28 discloses an example in which the first insight generation unit exemplified above detects anomalies regarding an attack group's campaign.

[0568] The first insight generation unit can check for changes in each item in the statistical data of an attack group's campaign. If an item included in the statistical data exceeds a certain threshold or statistically falls outside a specific range regarding these changes, it can be set as an anomaly.

[0569] This diagram illustrates statistical data for the attack group Barium.

[0570] The intelligence platform (2201) can continuously generate information such as file types, IP addresses, related domains, and related URLs for the attack group's campaigns.

[0571] Here, examples are provided by visualizing the attack group Barium and its related attack groups, along with information such as various file types, IP addresses, associated domains, and associated URLs used in this attack group campaign.

[0572] The anomaly detection module (3211) of the first insight generation unit can detect anomalies by receiving campaign statistics data for each attack group as exemplified. Looking at the associated campaign statistics (last 90 days) data shown in the lower part of this diagram, it usually occurs 2 to 3 times or 5 to 6 times, but on October 10, an anomaly can be seen that the number of occurrences increased sharply to 24 times.

[0573] In this way, based on date data, data up to the previous day can be aggregated or statistics can be used to detect anomalies using a threshold value, and an alarm can be provided.

[0574] Then, the first statistical module of the first insight generation unit can produce statistical data by category, such as affected industries, affected countries, threat classifications, and attack techniques related to the campaign, based on such abnormal sign alarms.

[0575] Using the statistical data generated in this way, the AI ​​agent can generate news by attack group.

[0576] For example, an AI agent can generate a headline news with the following sentence.

[0577]

[0578] Recently, there has been a surge in F threat attacks targeting D industry in country C by attack group A using B attack techniques.

[0579]

[0580] In this way, by using the statistical data on campaigns by attack group from the first insight generation unit, natural language news about items related to that campaign can be generated.

[0581]

[0582] FIG. 29 discloses an example of prompt generation that can generate information using the natural language model exemplified above.

[0583] In this way, based on statistical data regarding campaigns by attack group from the first insight generation unit, PromptHub generates prompts for queries in a natural language model.

[0584] This diagram is an example of a prompt that generates natural language news related to an attack group campaign.

[0585] Prompts associated with attack group campaigns include a request part, a headline part, and a related data part extracted from statistical data.

[0586] For example, the request part of the prompt includes a description of the provided data and a specific request for news generation. Here, the provided data is campaign insight statistics collected in response to anomalies, and an example is provided of a specific request to generate relevant news using this data.

[0587] The headline part of the prompt provides the format of a news headline. The headline part may include items of statistical data related to an attack group campaign. In this example, the items of statistical data include the attack group, attack method, target country, target industry, and threat type.

[0588] The data part of the prompt can be configured to include values ​​for items of statistical data calculated by the actual first statistical module. In this example, data for the attack group Barium, the attack technique T1224, the target countries of the United States and Japan, the target industries of education and healthcare, and the threat type Ransomware were exemplified.

[0589]

[0590] FIG. 30 illustrates a procedure for generating insight information of the second insight generation unit among the disclosed cyber threat information processing device.

[0591] The second insight generation unit (3210) may include an Indicator of Compromise (IoC) filter module (3221) and a second statistics module (3223).

[0592] The Indicator of Infringement (IoC) filter module (3221) generates statistical data by filtering data related to infringement indicators among the statistical data collected by the statistical data collection unit (3100).

[0593] For example, the Indicator of Compromise (IoC) filter module (3221) can collect information on each indicator of compromise from statistical data.

[0594] The Indicator of Compromise (IoC) filter module (3221) can generate a list of the top N IPs, a list of the top N domains, a list of the top N URLs, a list of the top N hash data, etc.

[0595] The second statistical module (3223) generates statistical data for each indicator of infringement generated by the indicator of infringement (IoC) filter module (3221) so as to provide insight into each indicator of infringement. The second statistical module (3223) can generate statistical data for the top indicators of infringement, such as, for example, related IoC-related affected industries, affected countries, threat classifications, and attack technique count values.

[0596] In other words, natural language news associated with each indicator of compromise can be generated based on statistical data such as affected industries, affected countries, threat classifications, or attack techniques.

[0597] For example, the format of news provided based on statistical data related to infringement indicators is as follows.

[0598]

[0599] Recently, threat attacks targeting industries F, G, and H in countries C, D, and E using B1 and B2 attack techniques from the 10.10.10.10 IP have been surging.

[0600]

[0601] To generate news that can provide insights based on such infringement indicators, the prompt hub (2350) generates prompts based on statistical data for each infringement indicator generated by the second statistical module (3223). For example, the prompt hub (2350) generates rules, guidelines, or context information for queries related to statistical data for each infringement indicator. Detailed examples thereof will be described later.

[0602] Then, the AI ​​agent (2350) can perform a natural language model using statistical data by infringement indicator generated by the prompt hub (2350) and generate natural language news information that explains related information in natural language.

[0603]

[0604] FIG. 31 discloses another example of prompt generation that can generate information using the natural language model exemplified above.

[0605] This diagram illustrates a prompt for generating natural language news related to indicators of compromise. Similar to the prompt for generating news related to attack group campaigns exemplified above, the prompt related to indicators of compromise includes a request part, a headline part, and a related data part extracted from statistical data.

[0606] For example, the request part of the prompt includes a specific request to generate news using the provided data, which is statistical data on infringement indicators. This example illustrates a specific request to generate related news using the provided data, which is insight statistical data on collected infringement indicators.

[0607] The headline part of the prompt provides the format of a news headline. The headline part may include items of statistical data related to indicators of compromise. In this example, the items of statistical data include attack method, target country, target industry, and threat type.

[0608] The data part of the prompt may include values ​​for items of statistical data produced by the second statistical module. In this example, the data part is exemplified as data for Indicator of Compromise (IoC) 10.10.10.10, attack technique T1224 or T1222, target countries United States and Japan, target industries education and healthcare, and threat type ransomware.

[0609]

[0610] Figure 32 illustrates a headline news generated using the natural language model exemplified above.

[0611] According to the disclosed example, news that can provide insights to users can be generated in natural language based on statistical data related to cyber threat information.

[0612] The example disclosed above can generate natural language news from statistical data of various threat information detected by an intelligence platform.

[0613] As examples of statistical data, statistical data obtained from attack group campaigns and indicators of compromise were exemplified, but similar news can be generated using other statistical data.

[0614] The prompt generated based on statistical data includes a request part for news generation, a headline part for delivering headline news, and a data part derived from actual statistical data.

[0615] As an example of news produced in this manner, the news in this diagram illustrates news about an attack group attacking an industry within a specific country (a), specific attack techniques used by the attack group (b), relevant URLs used in the attack techniques (c), content related to the attack damage (d), and information about the attack group (e).

[0616] And, news about headlines (f) related to (a) to (e) above can be provided.

[0617]

[0618] FIG. 33 discloses an example of processing cyber threat information in which news can be automatically provided using statistical data insights.

[0619] A cyber threat information processing device including a storage device and a processor collects data on cyber threat information detected (S710). Examples of data that can provide insights to the user from the perspective of the collected data include attack group campaigns or threat compromise indicators, but the same examples can be applied to other data.

[0620] The method by which a cyber threat device detects and collects cyber threat information is exemplified in FIGS. 1 to 15.

[0621] Statistical data can be calculated from the collected data according to the type of the data (S730). Statistical data may be calculated differently depending on the data type. For example, in the case of an attack group campaign, abnormal signs exceeding a threshold can be determined, or in the case of an infringement indicator, statistics based on the indicator can be calculated. Examples of calculating statistical data from the collected data according to the type of the data are illustrated in FIGS. 26, 29 to 30.

[0622] In the case of a campaign based on an attack group, statistical data on various attack methods included in the campaign's attack can be calculated, and in the case of threat compromise indicators, statistical data on the elements included in the compromise indicators can be calculated respectively.

[0623] Based on the generated statistical data, a prompt is generated, and a natural language model is used to generate and provide news based on the generated prompt (S750).

[0624] Examples of prompt requests for news generation, headline formats of news information considering statistical data of cyber threat information, and collected statistical data were provided.

[0625] By utilizing such cyber threat-generating news requests, the headlines of those news, and collected statistical data, users can obtain real-time, user-friendly cyber attack news.

[0626] The steps disclosed above may also be performed by a program processed by a processor of a cyber threat information processing device.

[0627] According to the disclosed example, users can gain intuitive insights into cyber threat information processed data and easily obtain natural language-based insights through interpretation information inherent in a vast amount of cyber threat information.

[0628] The disclosed examples are repeatable and have industrial applicability.

Claims

1. Step of collecting data on cyber threat information; A step of calculating statistical data according to the type of the collected data above; and A cyber threat information processing method comprising the step of generating natural language news about the collected data according to a prompt generated based on the statistical data calculated above.

2. In Paragraph 1, If the above collected data is related to a cyber threat attack group's campaign (APT), A cyber threat information processing method that generates the natural language news based on abnormal data of attack means used by the above campaign.

3. In Paragraph 1, A cyber threat information processing method that generates natural language news based on data corresponding to the indicator of compromise when the collected data is included in the indicator of compromise of cyber threat information.

4. In Paragraph 1, The above prompt is, The request part requesting the generation of the above natural language news, A headline part providing a headline format among the above natural language news; A cyber threat information processing method comprising a data part containing the statistical data calculated above.

5. A database for storing data; and It includes a processor that processes the above data, The above processor is, Collecting data on cyber threat information; Calculate statistical data according to the type of the collected data above; and A cyber threat information processing device that generates natural language news about the collected data according to a prompt generated based on the statistical data calculated above; and executes a set of commands including commands.

6. In Paragraph 5, If the above collected data is related to a cyber threat attack group's campaign (APT), A cyber threat information processing device that generates the natural language news based on abnormal data of attack means used by the above campaign.

7. In Paragraph 5, The data collected above, A cyber threat information processing device that generates natural language news based on data corresponding to an indicator of compromise when the cyber threat information includes an indicator of compromise.

8. In Paragraph 5, The above prompt is, The request part requesting the generation of the above natural language news, A headline part providing a headline format among the above natural language news; and A cyber threat information processing device comprising a data part containing the statistical data calculated above.

9. Collect data on cyber threat information; Calculate statistical data according to the type of the collected data above; and A storage medium storing a computer-executable program for processing cyber threat information, which generates natural language news about the collected data according to a prompt generated based on the statistical data calculated above; and executes a set of instructions including instructions.

10. In Paragraph 8, The above prompt is, The request part requesting the generation of the above natural language news, A headline part providing a headline format among the above natural language news; and A storage medium for storing a computer-executable program that processes cyber threat information, comprising a data part containing the statistical data calculated above.