Communication analysis system, analysis method, and program
The communication analysis system addresses ICS challenges by creating an analytical network graph to verify and manage out-of-whitelist communications, improving analysts' efficiency and reducing false positives.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- PANASONIC INTELLECTUAL PROPERTY CORP OF AMERICA
- Filing Date
- 2022-09-06
- Publication Date
- 2026-06-22
AI Technical Summary
Industrial Control Systems (ICS) face challenges in efficiently analyzing abnormal communications not included in whitelists due to insufficient learning periods, leading to false positives and overburdening analysts, despite existing methods providing alert priorities without clear guidance on handling them.
A communication analysis system that creates an analytical network graph using past communication information and machine learning to identify similar terminals and communication links, aiding analysts in verifying the legitimacy of out-of-whitelist communications.
Facilitates easy analysis of anomalies outside the whitelist, reducing false positives and enhancing analysts' efficiency by providing graphical and message-based support for handling alerts.
Smart Images

Figure 0007877336000001 
Figure 0007877336000002 
Figure 0007877336000003
Abstract
Description
Technical Field
[0001] The present disclosure relates to a communication analysis system, an analysis method, and a program for analyzing the abnormality of communication not included in a whitelist.
Background Art
[0002] Conventionally, in a communication network (NW) in an industrial control system (ICS: Industrial Control System) that manages and controls equipment such as factory facilities, a proprietary communication standard has been used. In recent years, in order to smarten up the ICS and improve convenience and versatility, an open communication standard has been used for the NW in the ICS.
[0003] However, it is not easy to change the configuration of the equipment constituting the ICS or to update the equipment, resulting in insufficient security measures. For this reason, it is vulnerable to cyberattacks, and the number of such damages is increasing year by year.
[0004] In addition, in the ICS, various terminals are connected in the same segment, and the terminals belonging to the same segment have little communication with the outside of the segment and often communicate with various protocols inside the segment. In other words, in the ICS, in order to execute a predetermined process, communication is often closed between specific terminals. Thus, the terminals used in the ICS have different communication characteristics from those used in general offices and the like.
[0005] Due to such communication characteristics, in the ICS, anomaly detection using a whitelist is considered effective and widely adopted (see, for example, Non-Patent Document 1). Thus, by using a whitelist, it is possible to detect communication that has never occurred before.
[0006] On the other hand, if the learning period for the whitelist—that is, the period for acquiring past communications used to create the whitelist—is insufficient, and the whitelist is not sufficiently learned, there is a risk of generating a large number of false positives for normal communications that the whitelist has not yet learned. As a result, the analysts at the Security Operation Center (SOC) who perform alert analysis may be overburdened and may not be able to properly respond to cyberattacks that should be addressed.
[0007] In response to this, a method has been disclosed that uses machine learning to learn the communication status of terminals and quantifies the degree of abnormality of communication links for communications not included in the whitelist, thereby indicating the priority of alerts that should be addressed (see, for example, Non-Patent Document 2). Here, a communication link refers to a combination of IP addresses (or MAC address information) that identify the source and destination terminals, the protocol exchanged between those terminals, and the classification of information. [Prior art documents] [Non-patent literature]
[0008] [Non-Patent Document 1] Dwight Anderson (2014). “Protect Critical Infrastructure Systems With Whitelisting” [Non-Patent Document 2] Tatsumi Oba, et al. (2020). “Graph Convolutional Network-based Suspicious Communication Pair Estimation for Industrial Control Systems” [Overview of the project] [Problems that the invention aims to solve]
[0009] However, simply providing alert priorities, as disclosed in Non-Patent Document 2, does not tell analysts how to actually deal with them, and even if alerts are prioritized based on priority, it may not improve the efficiency of the analysts' analytical work.
[0010] This disclosure is made in light of the circumstances described above and provides a communication analysis system, etc., that can easily analyze the abnormality of communications not included in the whitelist. [Means for solving the problem]
[0011] To solve the above problems, a communication analysis system according to one aspect of the present disclosure is a communication analysis system that analyzes network communications of multiple terminals in a predetermined environment that is a target of monitoring, the communication analysis system comprises a whitelist created by learning communications performed in the target of monitoring, a communication information DB that holds past communication information consisting of information indicating past communications performed in the target of monitoring, and an analysis support graph creation system that creates graph information for analyzing out-of-whitelist communications which are communications not included in the whitelist, the analysis support graph creation system comprises an information receiving unit that receives information indicating communications performed in the target of monitoring that are the target of analysis, an information acquisition unit that acquires the past communication information from the communication information DB, and a whitelist that determines whether an out-of-whitelist communication has occurred in the communications of the target of analysis The system includes: a whitelist determination unit; a similar terminal extraction unit that extracts one or more similar terminals of the destination terminal and one or more similar terminals of the source terminal in the whitelisted communication link, which is the whitelisted communication link determined by the whitelist determination unit; a primary similar communication link extraction unit that uses the first and second similar terminals extracted by the similar terminal extraction unit to extract past communication links similar to the whitelisted communication link, where the first and second similar terminals are the destination terminals or source terminals, from the past communication information acquired by the information acquisition unit, as primary similar communication links; and an NW graph creation unit that uses the primary similar communication links extracted by the primary similar communication link extraction unit and the past communication information acquired by the information acquisition unit to create an analytical NW graph as graph information for analyzing the whitelisted communication.
[0012] These general or specific embodiments may be implemented as a system, method, integrated circuit, computer program, or recording medium such as a computer-readable CD-ROM, or as any combination of a system, method, integrated circuit, computer program, and recording medium. [Effects of the Invention]
[0013] According to this disclosure, it is possible to realize a communication analysis system that can easily analyze the abnormality of communications not included in the whitelist. [Brief explanation of the drawing]
[0014] [Figure 1] Figure 1 is a diagram showing the overall configuration including the communication analysis system according to Embodiment 1. [Figure 2A] Figure 2A is a block diagram showing an example of the configuration of the analysis support graph creation system according to Embodiment 1. [Figure 2B] Figure 2B is a block diagram showing the minimum configuration of the analysis support graph generation system shown in Figure 2A. [Figure 3] Figure 3 is a block diagram showing an example of the configuration of the communication link learning system according to Embodiment 1. [Figure 4] Figure 4 is a flowchart showing the operation of the analysis method of the communication analysis system according to Embodiment 1. [Figure 5] Figure 5 is a flowchart showing the analysis support graph creation process of the analysis support graph creation system according to Embodiment 1. [Figure 6] Figure 6 is a flowchart showing an example of the similar terminal extraction process shown in Figure 5. [Figure 7] Figure 7 is a flowchart showing an example of the primary similar communication link extraction process shown in Figure 5. [Figure 8A] Figure 8A shows an example of the extraction process for primary similar communication links according to Embodiment 1. [Figure 8B] Figure 8B shows an example of the extraction process for primary similar communication links according to Embodiment 1. [Figure 8C] FIG. 8C is a diagram showing an example of the extraction process of the primary similar communication link according to Embodiment 1. [Figure 9] FIG. 9 is a flowchart showing an example of the secondary similar communication link extraction process shown in FIG. 5. [Figure 10A] FIG. 10A is a diagram showing an example of the extraction process of the secondary similar communication link according to Embodiment 1. [Figure 10B] FIG. 10B is a diagram showing an example of the extraction process of the secondary similar communication link according to Embodiment 1. [Figure 10C] FIG. 10C is a diagram showing an example of the extraction process of the secondary similar communication link according to Embodiment 1. [Figure 11] FIG. 11 is a flowchart showing an example of the analysis auxiliary graph creation process shown in FIG. 5. [Figure 12] FIG. 12 is a flowchart showing an example of the detailed process of step S455 shown in FIG. 11. [Figure 13A] FIG. 13A is a diagram showing an example of the screen information displayed on the screen when there is a primary similar communication link according to Embodiment 1. [Figure 13B] FIG. 13B is a diagram showing an example of the screen information displayed on the screen when there is a primary similar communication link according to Embodiment 1. [Figure 13C] FIG. 13C is a diagram showing an example of the screen information displayed on the screen when there is a primary similar communication link according to Embodiment 1. [Figure 14A] FIG. 14A is a diagram showing an example of the screen information displayed on the screen when there is no primary similar communication link but there is a secondary similar communication link according to Embodiment 1. [Figure 14B] FIG. 14B is a diagram showing an example of the screen information displayed on the screen when there is no primary similar communication link but there is a secondary similar communication link according to Embodiment 1. [Figure 14C] FIG. 14C is a diagram showing an example of the screen information displayed on the screen when there is no primary similar communication link but there is a secondary similar communication link according to Embodiment 1. [Figure 15A] Figure 15A is a diagram showing an example of screen information displayed on the screen when there are no primary similar communication links and secondary similar communication links according to Embodiment 1. [Figure 15B] Figure 15B is a diagram showing an example of switching screen information according to Embodiment 1. [Figure 16] Figure 16 shows an example of screen information displayed on the screen when adding WL-external communication according to Embodiment 1 to the WL. [Figure 17] Figure 17 shows an example of some of the screen information displayed on the screen when adjusting the display range of the similar terminal / communication graph according to Embodiment 1. [Figure 18] Figure 18 is a diagram showing the overall configuration including the communication analysis system according to Embodiment 2. [Figure 19A] Figure 19A is a block diagram showing an example of the configuration of a communication link prediction graph generation system according to Embodiment 2. [Figure 19B] Figure 19B is a block diagram showing the minimum configuration of the communication link prediction graph generation system shown in Figure 19A. [Figure 20] Figure 20 is a flowchart showing the operation of the analysis method of the communication analysis system according to Embodiment 2. [Figure 21] Figure 21 is a flowchart showing the overall processing of the communication link prediction graph creation system according to Embodiment 2. [Figure 22] Figure 22 is a flowchart showing an example of the process for selecting an unoccurred communication link, as shown in Figure 21. [Figure 23] Figure 23 shows an example of a prediction target list that includes unoccurred communication link candidates according to Embodiment 2. [Figure 24] Figure 24 is a flowchart showing an example of the confidence level calculation process shown in Figure 21. [Figure 25] Figure 25 shows an example of a case where the confidence level calculated according to Embodiment 2 is added to the list of prediction targets. [Figure 26] Figure 26 is a flowchart showing an example of the process for creating the predicted network graph shown in Figure 21. [Figure 27] Figure 27 shows an example of an unoccurred communication link highlighted on the NW graph in the prediction target list shown in Figure 26. [Figure 28] Figure 28 shows an example of screen information displayed on the screen when analyzing unoccurred communication links according to Embodiment 2. [Figure 29] Figure 29 shows another example of screen information displayed on the screen when analyzing unoccurred communication links according to Embodiment 2. [Figure 30] Figure 30 is a diagram showing an example of screen information displayed on the screen when the prediction results for unoccurred communication links according to Embodiment 2 are shown. [Figure 31] Figure 31 is a diagram showing an example of screen information displayed on the screen when adjusting the display range of the graph of unoccurred communication links according to Embodiment 2. [Figure 32] Figure 32 shows an example of how to fill out the completion documents for a building. [Figure 33] Figure 33 is a flowchart showing the process for determining similar devices using predetermined rules. [Modes for carrying out the invention]
[0015] (Knowledge that forms the basis of this disclosure) In ICS, each terminal connected to the network differs from terminals connected to a typical office network and operated by a human. Each terminal has a role suited to the ICS environment, and often operates automatically according to the settings configured for that role.
[0016] An example of an ICS environment is a factory field network. In a factory field network, there are actuators that operate according to each work process within the factory. The actuators receive instructions transmitted from the Supervisory Control and Data Acquisition (SCADA) system. Each Programmable Logic Controller (PLC) converts digital and analog signals to control the actuators.
[0017] Furthermore, SCADA collects and records equipment information from multiple sensors, transmits this information to the Human Machine Interface (HMI), and visualizes the data on the HMI so that on-site managers can understand the situation.
[0018] Thus, in ICS, each terminal often only communicates with other terminals that are necessary to execute a predetermined process, resulting in a limited amount of communication. For this reason, creating a whitelist that learns the specific communications of terminals is considered an effective countermeasure against cyberattacks in ICS.
[0019] Therefore, the inventors first attempted to implement anomaly detection using a whitelist in a real ICS environment. However, without realizing that there are normal communications such as rare events, i.e., normal communications that occur at long intervals, they performed whitelist learning during a learning period that did not include such long intervals. As a result, a large number of normal communications that occur at long intervals were mistakenly identified as communication links (also called WL-external communications) that were not included in the whitelist.
[0020] Furthermore, upon analyzing these false positives, the inventors discovered that similar communications had occurred in the past on other terminals with the same role as the terminal involved in the falsely detected communications within the ICS environment.
[0021] In response, the inventors found that when an external WL communication link occurs, they provide the analyst with information about communications similar to the external WL communication link and terminals similar to the terminal of the said external WL communication link. Furthermore, they found that this information allows the analyst to confirm whether the generated external WL communication link is a communication that should not occur under normal operation, thereby streamlining the handling of these false positive alerts.
[0022] Furthermore, when implementing a whitelist in the field, the learning period required to adequately train the whitelist varies from site to site, meaning that it may not be possible to adequately train the whitelist. If a whitelist that has not been adequately trained is used, not only will normal communication links be mistakenly identified as out-of-wall communication links, but a large number of such false positives may occur. If a whitelist is put into operation before it has been adequately trained, and when a normal out-of-wall communication actually occurs, analysts have to manually analyze the out-of-wall communication and add it to the whitelist, it will place a burden on the analysts and may prevent them from properly handling the cyberattacks they should be addressing.
[0023] In response, the inventors discovered that by adding previously unaddressed communication links—those that were not yet active during the learning period but are highly likely to be normal—to the whitelist before implementing the whitelist, they could efficiently operate the whitelist and suppress false positives that would occur after implementation. Furthermore, they found that by efficiently operating the whitelist in this way, the legitimacy of out-of-wall communication occurring during operation could be easily verified, and the response to alerts could be made more efficient.
[0024] Based on the above considerations, we have come up with the communication analysis system and other related concepts described below.
[0025] In other words, a communication analysis system according to one aspect of the present disclosure is a communication analysis system that analyzes network communications of multiple terminals in a predetermined environment that is a target of monitoring, the communication analysis system comprises a whitelist created by learning communications performed in the target of monitoring, a communication information DB that holds past communication information consisting of information indicating past communications performed in the target of monitoring, and an analysis support graph creation system that creates graph information for analyzing out-of-whitelist communications which are communications not included in the whitelist, the analysis support graph creation system comprises an information receiving unit that receives information indicating communications performed in the target of monitoring that are the target of analysis, an information acquisition unit that acquires the past communication information from the communication information DB, and a whitelist that determines whether an out-of-whitelist communication has occurred in the communications of the target of analysis using the information indicating communications of the target of analysis acquired by the information receiving unit and the whitelist. The system includes: a whitelist determination unit; a similar terminal extraction unit that extracts one or more similar terminals of the destination terminal and one or more similar terminals of the source terminal in the whitelisted communication link, which is the whitelisted communication link determined by the whitelist determination unit; a primary similar communication link extraction unit that uses the first and second similar terminals extracted by the similar terminal extraction unit to extract past communication links similar to the whitelisted communication link, where the first and second similar terminals are the destination terminals or source terminals, from the past communication information acquired by the information acquisition unit, as primary similar communication links; and an NW graph creation unit that uses the primary similar communication links extracted by the primary similar communication link extraction unit and the past communication information acquired by the information acquisition unit to create an analytical NW graph as graph information for analyzing the whitelisted communication.
[0026] According to this method, a network graph for analysis can be created using the primary similar communication link of a newly generated WL (Wallware Network) communication link and past communication information, as graphical information for analyzing the WL communication. The user, acting as the analyst, can then easily verify the legitimacy of the WL communication using the analytical network graph.
[0027] In this way, it is possible to easily analyze the anomalies of communications that are not included in the whitelist.
[0028] Furthermore, for example, the analysis support graph creation system further includes a secondary similar communication link extraction unit that extracts secondary similar communication links which are different communication links from the primary similar communication link and the whitelisted communication link, and the similar terminal extraction unit further extracts a third similar terminal which is one or more similar terminals of a different recipient terminal than the source terminal of a communication previously made by the destination terminal in the whitelisted communication link, and a fourth similar terminal which is one or more similar terminals of a different recipient terminal than the destination terminal of a communication previously made by the source terminal, and the secondary similar communication link extraction unit extracts the first similar terminal and the second similar terminal extracted by the similar terminal extraction unit. The NW graph creation unit may use the similar terminal, the third similar terminal, the fourth similar terminal, and the past communication information acquired by the information acquisition unit to extract, as the secondary similar communication links, communication links that are different from the primary similar communication links and the non-whitelisted communication links acquired by the primary similar communication link extraction unit, and that are similar to past communications made by the source terminal or destination terminal in the non-whitelisted communication links. The NW graph creation unit may then use the extracted primary similar communication links and secondary similar communication links, and the past communication information acquired by the information acquisition unit, to create the NW graph for analysis.
[0029] According to this method, a network graph for analysis can be created using primary and secondary similar communication links of a newly generated network outside the WL, along with past communication information, to analyze the network outside the WL. The user, acting as the analyst, can then easily verify the legitimacy of the network outside the WL using the analysis graph.
[0030] In this way, it is possible to easily analyze the anomalies of communications that are not included in the whitelist.
[0031] Furthermore, for example, the NW graph creation unit may create the NW graph and also create messages based on the created NW graph as auxiliary information for the user to perform the analysis.
[0032] This allows messages to be displayed as supplementary information for the user to analyze, making it easier for the user to perform the analysis by utilizing these messages.
[0033] Furthermore, for example, the analysis support graph creation system may also include a network graph display unit that displays the network graph created by the network graph creation unit on a screen, and a network change operation unit that adds the non-whitelisted communications to the whitelist according to the user's instructions.
[0034] This allows users to easily add external communications to the whitelist.
[0035] Furthermore, for example, the NW graph display unit may display a slider bar on the screen for adjusting the similarity threshold between the destination terminal and the source terminal in the whitelisted communication link and the similar terminal, and reconfigure the number of similar terminals shown in the NW graph and display it on the screen according to the threshold changed by the user's operation on the slider bar.
[0036] In this way, by manipulating the slider bar, the number of similar terminals (display range) of terminals in an external WL communication link can be adjusted, which may make it easier to verify the legitimacy of external WL communication.
[0037] Furthermore, for example, the NW graph display unit may, when the user selects any two communication links from the whitelisted communication links and the primary similar communication links, or any two communication links from the secondary similar communication links, in the NW graph displayed on the screen, display detailed information of the two selected communication links as comparative information for analyzing the whitelisted communication.
[0038] In this way, the ability to display comparative information may make it easier to verify the legitimacy of communication outside the WL (Wireless Network).
[0039] Furthermore, for example, the NW graph display unit may display a message regarding the procedure for selecting the two communication links, guiding the user so that the comparison information is displayed on the screen.
[0040] This allows users to view comparison information by referring to the operating procedures, potentially making it easier to verify the legitimacy of communication outside the WL (Wireless Network).
[0041] Furthermore, for example, if the user selects a communication link outside the whitelist, the NW graph display unit may display a button to determine whether to add the selected communication link to the whitelist, and the WL change operation unit may add the communication to the whitelist according to the user's input to the button.
[0042] Furthermore, for example, the NW graph creation unit may highlight the non-whitelisted communication links on the NW graph display unit so that the user can identify the non-whitelisted communication links from other communication links in the created NW graph.
[0043] This allows users to intuitively understand external communication links within the WL (Wireless Network).
[0044] Furthermore, for example, the NW graph creation unit may group the destination terminal and the source terminal in the non-whitelisted communication link with a plurality of similar terminals, including the first similar terminal, second similar terminal, third similar terminal, and fourth similar terminal, and display them on the NW graph display unit, so that the user can identify similar terminals belonging to the same group and different groups from each other in the created NW graph.
[0045] This allows users to understand the similarity relationships between one or more similar terminals on a WL-external communication link.
[0046] Alternatively, for example, the similar terminal extraction unit may extract similar terminals by calculating the similarity between the multiple terminals using a trained machine learning model.
[0047] In this way, by using a trained machine learning model, similar devices can be extracted with high accuracy. Therefore, since a network graph created using these accurately extracted similar devices can be used, users can easily analyze the anomalies of communications that are not included in the whitelist.
[0048] Here, for example, the machine learning model may be generated by learning communications performed on the monitored object using a link prediction or node classification algorithm that can create a fixed-dimensional vector for each terminal that appeared in the communication and a fixed-size matrix for each type of communication that appeared in the communication.
[0049] Furthermore, for example, the machine learning model may consist of any of the following: LinkFeat, COMPGCN (COMPosition-based multi-relational Graph Convolutional Networks), R-GCN (Relational Graph Convolutional Network), DistMult, TransE (Translating Embeddings for Modeling Multi-relational Data), HolE (Holographic Embeddings of Knowledge Graphs), or ComplEx (Complex Embeddings for Simple Link Prediction).
[0050] The embodiments described below are all specific examples of this disclosure. The numerical values, shapes, components, steps, and order of steps shown in the following embodiments are examples only and are not intended to limit this disclosure. Furthermore, any components in the following embodiments that are not described in an independent claim are described as optional components. In addition, the contents of each embodiment can be combined.
[0051] (Embodiment 1) The communication analysis system 20 according to Embodiment 1 will be described below with reference to the drawings.
[0052] [1 Overall Structure] Figure 1 is a diagram showing the overall configuration including the communication analysis system 20 according to Embodiment 1.
[0053] As shown in Figure 1, the communication analysis system 20 is connected to the monitored target 10 located in a remote location via a network 30. The communication analysis system 20 analyzes the network communications of multiple terminals in a predetermined environment, which is the monitored target 10. In this embodiment, the communication analysis system 20 sequentially collects network communication information of the monitored target 10 via the network 30, performs various analyses such as learning the communication content and detecting anomalies, and stores the information. Further details will be described later.
[0054] The monitored targets 10 are multiple terminals in a predetermined environment such as an ICS environment like a factory or building, and transmit communications performed at the monitored targets 10 to the communication analysis system 20. In this embodiment, communication data collected by a terminal connected to a network in the ICS that collects communication packets exchanged on the network is transmitted to the communication analysis system 20 as communication information performed at the monitored targets 10.
[0055] Network 30 is, for example, a general internet connection or a dedicated line. Network 30 can securely send communication information from the monitored object 10 to the communication analysis system 20 by using, for example, VPN (Virtual Private Network) communication.
[0056] [1.2 Communication Analysis System 20] The communication analysis system 20 creates and presents graph information to enable analysts to efficiently address alerts for WL-external communication links detected using a learned whitelist 22. The communication analysis system 20 extracts past communications that may be useful for analyzing WL-external communication links, and creates graph information from the extracted past communications to present the analyst with an analysis graph and information about each of those communications.
[0057] As shown in Figure 1, the communication analysis system 20 according to this embodiment includes an analysis support graph creation system 21, a whitelist 22, a communication link learning system 23, and a communication information DB 24. The communication analysis system 20 resides in a cloud or on-premise analysis environment and is connected to the monitored target 10 via a network 30.
[0058] [1.2.1 Whitelist 22] Whitelist 22 is a list of communications that are not abnormal. Whitelist 22 is created by learning the communications that take place on the monitored device 10. In this embodiment, whitelist 22 stores the communication content (content of the communication link) of the monitored device 10 when it is operating normally, based on the communication information of the monitored device 10. Whitelist 22 is used to determine whether the communications that occur on the monitored device 10 are normal communications or not.
[0059] [1.2.2 Communication Information DB24] The communication information DB24 is implemented using semiconductor memory such as an HDD (Hard Disk Drive) or SSD (Solid State Drive) flash memory. The communication information DB24 holds past communication information consisting of information indicating past communications that have taken place at the monitored device 10. Past communication information includes communication information such as which terminals (destination terminal and source terminal) each terminal at the monitored device 10 has communicated with in the past, and what protocols it has used.
[0060] [1.2.3 Analysis Support Graph Creation System 21] The analysis support graph creation system 21 creates graph information for analyzing off-whitelist communications (WL off-whitelist communications), which are communications not included in the whitelist 22. In this embodiment, the analysis support graph creation system 21 uses the whitelist 22 to capture WL off-whitelist communication links of the monitored target 10. The analysis support graph creation system 21 also uses the machine learning model learned by the communication link learning system 23 and the past communication information held in the communication information DB 24 to create graph information to assist in the analysis of abnormalities in WL off-whitelist communications.
[0061] Figure 2A is a block diagram showing an example of the configuration of the analysis assistance graph creation system 21 according to Embodiment 1. Figure 2B is a block diagram showing the minimum configuration of the analysis assistance graph creation system 21 shown in Figure 2A.
[0062] As shown in Figure 2A, the analysis assistance graph creation system 21 includes an information receiving unit 2101, a WL determination unit 2102, a similar terminal extraction unit 2103, a model acquisition unit 2104, an information acquisition unit 2105, a secondary similar communication link extraction unit 2106, a primary similar communication link extraction unit 2107, a WL change operation unit 2108, a NW graph creation unit 2109, an NW graph display unit 2110, an extraction condition storage unit 2111, and an extraction condition setting unit 2112. As shown in Figure 2B, the analysis assistance graph creation system 21a only needs to include an information receiving unit 2101, a WL determination unit 2102, a similar terminal extraction unit 2103, an information acquisition unit 2105, a primary similar communication link extraction unit 2107, and a NW graph creation unit 2109 as its minimum configuration.
[0063] The analysis support graph creation system 21 includes, for example, a computer including memory and a processor (microprocessor), and the processor executes a predetermined program stored in memory to realize the functions of each component.
[0064] The information receiving unit 2101 receives communication information, which is information indicating the communication taking place in the monitored object 10 and which is the communication to be analyzed. In this embodiment, the information receiving unit 2101 is, for example, a communication interface, and receives communication data from the monitored object 10 and extracts communication information of the monitored object 10 from the received communication data.
[0065] The WL determination unit 2102 uses communication information indicating the communication to be analyzed and the whitelist 22 to determine whether an out-of-WL communication has occurred in the communication to be analyzed. The WL determination unit 2102 may, for example, be equipped with a computer including memory and a processor (microprocessor), and the determination function may be realized by the processor executing a predetermined program stored in memory. In this embodiment, the WL determination unit 2102 uses the communication information to be analyzed in the monitored target 10 received by the information receiving unit 2101 and the whitelist 22 to determine whether there is an out-of-WL communication among the communication to be analyzed.
[0066] The model acquisition unit 2104 acquires a machine learning model from the communication link learning system 23. In this embodiment, the model acquisition unit 2104 acquires a machine learning model generated by learning the communication that took place in the monitored object 10.
[0067] The similar terminal extraction unit 2103 may include, for example, a computer including memory and a processor (microprocessor), and the following extraction functions may be realized by the processor executing a predetermined program stored in memory.
[0068] For example, the similar terminal extraction unit 2103 extracts a first similar terminal, which is one or more similar terminals of the destination terminal in the WL-external communication link, which is a communication link for WL-external communication determined by the WL determination unit 2102, and a second similar terminal, which is one or more similar terminals of the source terminal.
[0069] Furthermore, the similar terminal extraction unit 2103 may extract a third similar terminal, which is one or more similar terminals of a different counterpart terminal than the source terminal in a past communication between the destination terminal and the WL-external communication link. The similar terminal extraction unit 2103 may also extract a fourth similar terminal, which is one or more similar terminals of a different counterpart terminal than the destination terminal in a past communication between the source terminal and the WL-external communication link.
[0070] Here, the similar terminal extraction unit 2103 may extract similar terminals by, for example, using a trained machine learning model to calculate the similarity between multiple terminals in a predetermined environment. By using a trained machine learning model, similar terminals can be extracted with high accuracy. Therefore, a network graph created using the accurately extracted similar terminals can be used. Alternatively, the similar terminal extraction unit 2103 may extract (determine) similar terminals in a predetermined environment using predetermined rules.
[0071] In other words, in this embodiment, the similar terminal extraction unit 2103 extracts similar terminals that are similar to each of the terminals (sender terminal and destination terminal) of the WL external communication link. Furthermore, the similar terminal extraction unit 2103 may also extract similar terminals of terminals related to each of the terminals (sender terminal and destination terminal) of the WL external communication link. Similar terminals are terminals with similar communication tendencies or terminals with similar roles.
[0072] In the example shown in Figure 2A, the similar terminal extraction unit 2103 extracts similar terminals for both the source and destination terminals of the WL-external communication link, for example, using the communication information of the WL-external communication link determined by the WL determination unit 2102 and the machine learning model acquired by the model acquisition unit 2104.
[0073] The information acquisition unit 2105 acquires past communication information from the communication information DB 24. The information acquisition unit 2105 is, for example, a communication interface. As mentioned above, the past communication information includes communication information such as what terminals (destination terminal and source terminal) each terminal in the monitored device 10 has communicated with in the past, and what protocols it has used.
[0074] The extraction condition setting unit 2112 sets extraction conditions for extracting information necessary for creating graph information. In this embodiment, the extraction condition setting unit 2112 allows the user to make various settings for elements that will extract communications that are primary similar communication links and secondary similar communication links from among the elements that make up past communication links.
[0075] The extraction condition storage unit 2111 holds extraction conditions for extracting information necessary for creating graph information. In this embodiment, the extraction condition storage unit 2111 stores the extraction condition settings set by the extraction condition setting unit 2112.
[0076] The primary similar communication link extraction unit 2107 may, for example, include a memory and a processor (microprocessor), and the primary similar communication link extraction function may be realized by the processor executing a predetermined program stored in the memory.
[0077] The primary similar communication link extraction unit 2107 extracts primary similar communication links from past communication information acquired by the information acquisition unit 2105, using the first similar terminal and second similar terminal extracted by the similar terminal extraction unit 2103. Here, the primary similar communication link is a past communication link similar to the WL external communication link, and is a past communication link with the first similar terminal and second similar terminal as the destination terminal or source terminal.
[0078] In other words, in this embodiment, the primary similar communication link extraction unit 2107 uses the similar terminals extracted by the similar terminal extraction unit 2103 to extract primary similar communication links, which are past communication links similar to the WL-external communication links, from the past communication information acquired by the information acquisition unit 2105.
[0079] In the example shown in Figure 2A, the primary similar communication link extraction unit 2107 uses the similar terminals extracted by the similar terminal extraction unit 2103 and the extraction conditions acquired by the extraction condition storage unit 2111 to extract primary similar communication links from past communication information acquired by the information acquisition unit 2105.
[0080] The secondary similar communication link extraction unit 2106 may, for example, include a memory and a processor (microprocessor), and the secondary similar communication link extraction function may be realized by the processor executing a predetermined program stored in the memory.
[0081] The secondary similar communication link extraction unit 2106 extracts secondary similar communication links that are different from the primary similar communication links and WL-external communication links. More specifically, the secondary similar communication link extraction unit 2106 extracts secondary similar communication links using the first similar terminal, second similar terminal, third similar terminal, and fourth similar terminal extracted by the similar terminal extraction unit 2103, and the past communication information acquired by the information acquisition unit 2105. Here, a secondary similar communication link is a communication link that is different from the primary similar communication links and WL-external communication links acquired by the primary similar communication link extraction unit 2107, and is a past communication link that is similar to a past communication made by the source terminal or destination terminal in the WL-external communication link.
[0082] In other words, in this embodiment, the secondary similar communication link extraction unit 2106 extracts secondary similar communication links from past communication information acquired by the information acquisition unit 2105, using similar terminals extracted by the similar terminal extraction unit 2103. In the example shown in Figure 2A, the secondary similar communication link extraction unit 2106 uses similar terminals extracted by the similar terminal extraction unit 2103 and extraction conditions acquired by the extraction condition storage unit 2111 to extract secondary similar communication links from past communication information acquired by the information acquisition unit 2105. A secondary similar communication link is a past communication link that may be related to an external WL communication link, and is a communication link other than a primary similar communication link. Specifically, a secondary similar communication link is a similar communication link that occurred between the source terminal of an external WL communication link and its similar terminal, and a similar communication link that occurred between the destination terminal of an external WL communication link and its similar terminal.
[0083] The NW graph creation unit 2109 includes, for example, memory and a processor (microprocessor), and the processor executes a predetermined program stored in memory to realize the following creation functions. The NW graph creation unit 2109 uses the primary similar communication links extracted by the primary similar communication link extraction unit 2107 and the past communication information acquired by the information acquisition unit 2105 to create an analytical NW graph as graph information for analyzing communications outside the WL.
[0084] Furthermore, the NW graph creation unit 2109 may create an NW graph for analysis using the extracted primary and secondary similar communication links and past communication information acquired by the information acquisition unit 2105.
[0085] In the example shown in Figure 2A, the NW graph creation unit 2109 creates an NW graph using similar communications extracted by the primary similar communication link extraction unit 2107 and the secondary similar communication link extraction unit 2106, and past communication information acquired by the information acquisition unit 2105.
[0086] The NW graph creation unit 2109 may instruct the NW graph display unit 2110 to display the created NW graph in a way that distinguishes between WL-external communication links and other links. Specifically, the NW graph creation unit 2109 may create graph information on the NW graph display unit 2110 that highlights WL-external communication links so that the user, who is also an analyst, can identify WL-external communication links from other communication links. This allows the user to intuitively understand WL-external communication links.
[0087] Furthermore, the NW graph creation unit 2109 may display grouped graph information on the NW graph display unit 2110 so that the user can identify similar terminals belonging to the same group and different groups in the created NW graph. For example, the NW graph creation unit 2109 may group the destination terminal and source terminal in the WL external communication link with multiple similar terminals, including a first similar terminal, a second similar terminal, a third similar terminal, and a fourth similar terminal, and display them on the NW graph display unit 2110. This allows the user to understand the similarity relationships between one or more similar terminals in the WL external communication link.
[0088] Furthermore, the NW graph creation unit 2109 may create a message for the user, who is also an analyst, based on the content of the created NW graph. Specifically, the NW graph creation unit 2109 may create a NW graph and, at the same time, create a message based on the created NW graph as auxiliary information for the user, who is also an analyst, to use for analysis. This allows the user to see a message as auxiliary information for analysis, making it easier for the user to perform analysis by utilizing the message.
[0089] The NW graph display unit 2110 displays the NW graph created by the NW graph creation unit 2109 on the screen. The NW graph display unit 2110 is, for example, a display or a touch panel, and presents the NW graph created by the NW graph creation unit 2109 and related information about the NW graph to the user, who is an analyst. The NW graph display unit 2110 may perform various processes by operations on a predetermined position on the touch panel or by operations on a predetermined position on the display via an input device such as a mouse.
[0090] For example, when a user selects an external WL communication link, the NW graph display unit 2110 may display a button to determine whether to add the external WL communication of the selected external WL communication link to the whitelist 22. This allows the user to add external WL communication to the whitelist 22 with simple operation.
[0091] Furthermore, for example, the NW graph display unit 2110 may display a slider bar on the screen for adjusting the similarity threshold between the destination terminal and the source terminal in the WL-external communication link. In this case, the NW graph display unit 2110 can reconstruct and display the NW graph on the screen according to the threshold changed by the user's operation on the slider bar. By operating on such a slider bar, the number of similar terminals (display range) of terminals in the WL-external communication link can be adjusted, which may make it easier to verify the legitimacy of the WL-external communication.
[0092] Furthermore, the NW graph display unit 2110 may display comparison information on the screen. Specifically, in the NW graph displayed on the screen by the NW graph display unit 2110, the user may select two communication links from either the WL-external communication links and primary similar communication links, or two communication links from among the secondary similar communication links. In this case, the NW graph display unit 2110 should display detailed information of the two selected communication links on the screen as comparison information for analyzing the WL-external communication. Displaying comparison information may make it easier to verify the legitimacy of the WL-external communication. The NW graph display unit 2110 may also display a message regarding the procedure for selecting two communication links to guide the user so that the comparison information is displayed on the screen. This allows the user to display the comparison information by referring to the procedure.
[0093] The WL change operation unit 2108 adds external WL communications to the whitelist 22 in response to user instructions. For example, the WL change operation unit 2108 may receive instructions from the NW graph display unit 2110 and add an external WL communication link to the whitelist 22. Specifically, the WL change operation unit 2108 may add external WL communications to the whitelist 22 in response to user input for buttons displayed on the screen by the NW graph display unit 2110.
[0094] [1.2.4 Communication Link Learning System 23] Next, we will explain the communication link learning system 23 shown in Figure 1.
[0095] Figure 3 is a block diagram showing an example of the configuration of the communication link learning system 23 according to Embodiment 1.
[0096] As shown in Figure 3, the communication link learning system 23 includes an information acquisition unit 2301, a model learning unit 2302, a model setting storage unit 2303, a model storage unit 2304, and a model setting unit 2305.
[0097] The communication link learning system 23 includes, for example, a computer including memory and a processor (microprocessor), and the processor executes a predetermined program stored in memory to realize the functions of each component.
[0098] The information acquisition unit 2301 acquires past communication information from the communication information DB 24. The information acquisition unit 2301 is, for example, a communication interface. The past communication information acquired by the information acquisition unit 2301 may be the same as the past communication information acquired by the information acquisition unit 2105, and is used for training to generate a machine learning model.
[0099] The model setting unit 2305 sets various setting information for the machine learning model and stores the set setting information in the model setting storage unit 2303.
[0100] The model learning unit 2302 acquires various information from the model setting storage unit 2303 and uses the past communication information acquired by the information acquisition unit 2301 to train the machine learning model.
[0101] In this embodiment, the machine learning model is generated using the graph analysis technique disclosed in Non-Patent Literature 2. More specifically, the machine learning model is composed of, for example, LinkFeat, COMPGCN (COMPosition-based multi-relational Graph Convolutional Networks), R-GCN (Relational Graph Convolutional Network), DistMult, TransE (Translating Embeddings for Modeling Multi-relational Data), HolE (Holographic Embeddings of Knowledge Graphs), or ComplEx (Complex Embeddings for Simple Link Prediction). This machine learning model is generated by learning past communication information acquired by the information acquisition unit 2301, i.e., communications conducted at the monitored target 10, using a link prediction or node classification algorithm. The link prediction or node classification algorithm is a technique that can create a fixed-dimensional vector for each terminal that appeared in the communication, and create a fixed-size matrix for each communication type that appeared in the communication. For example, from past communication information, the vector representation of each of the multiple terminals at the monitored target 10 and the coefficient matrix for each protocol are initialized. Next, the model is trained so that the quadratic form of pairs that have previously had a communication link is large, and the quadratic form of pairs that have not previously had a communication link is small. In this way, a machine learning model can be trained and generated by acquiring vector representations of each terminal from past communication information acquired by the information acquisition unit 2301.
[0102] The model storage unit 2304 is implemented using an HDD (Hard Disk Drive) or SSD (Solid State Drive), etc. The model storage unit 2304 stores various setting information. In addition, the model storage unit 2304 stores the trained machine learning model that has been trained by the model learning unit 2302.
[0103] [1.3 Operation of the Communication Analysis System 20] Next, we will explain the operation of the communication analysis system 20 configured as described above.
[0104] [1.3.1 Operation of the Analysis Method] Figure 4 is a flowchart showing the operation of the analysis method of the communication analysis system 20 according to Embodiment 1. Figure 4 shows the analysis support graph creation process using the minimum configuration of the analysis support graph creation system 21 shown in Figure 2B as an example of the operation of the analysis method.
[0105] First, the analysis support graph creation system 21 of the communication analysis system 20 receives information indicating the communication to be analyzed that takes place in the monitored object 10 (S1). The communication that takes place in the monitored object 10 is the network communication of multiple terminals in a predetermined environment which is the monitored object 10.
[0106] Next, the analysis support graph creation system 21 retrieves past communication information from the communication information DB 24 (S2). More specifically, the analysis support graph creation system 21 retrieves past communication information from the communication information DB 24, which holds past communication information consisting of information indicating past communications that took place at the monitored target 10.
[0107] Next, the analysis support graph creation system 21 uses the information indicating the communication to be analyzed, obtained in step S2, and the whitelist 22 to determine whether an out-of-wall communication, which is a communication not included in the whitelist 22, has occurred in the communication to be analyzed (S3).
[0108] Next, the analysis support graph creation system 21 extracts first similar terminals, which are one or more similar terminals to the destination terminal in the WL-external communication link, and second similar terminals, which are one or more similar terminals to the source terminal (S4). In this way, the analysis support graph creation system 21 extracts similar terminals to each of the terminals in the WL-external communication link (source terminal and destination terminal).
[0109] Next, the analysis support graph creation system 21 uses the first similar terminal and the second similar terminal extracted in step S4 to extract the primary similar communication link from the past communication information acquired in step S2 (S5). Here, the primary similar communication link is a past communication link similar to the WL-external communication link, and is a past communication link where the first similar terminal and the second similar terminal are the destination terminal or the source terminal.
[0110] Next, the analysis support graph creation system 21 uses the primary similar communication links extracted in step S5 and the past communication information obtained in step S2 to create an analysis network graph (S6) as information for analyzing communications outside the WL.
[0111] By performing this analysis, it is possible to create graph information for analyzing WL-external communications, which are communications not included in the whitelist 22.
[0112] [1.3.2 Detailed Operation of the Analysis Method] Next, we will explain the detailed operation of the analysis method, specifically the process of creating an analysis support graph using the analysis support graph creation system 21 shown in Figure 2A.
[0113] Figure 5 is a flowchart showing the analysis support graph creation process of the analysis support graph creation system 21 according to Embodiment 1.
[0114] First, the analysis support graph creation system 21 receives information (communication information) that indicates the communication taking place at the monitored object 10 and which is the communication to be analyzed.
[0115] Next, as shown in Figure 5, the analysis support graph generation system 21 uses the learned whitelist 22 to determine whether an external WL communication link has occurred in the received communication information of the monitored target 10 (S41).
[0116] If it is determined in step S41 that it is not an external WL communication link (no in step S41), the analysis support graph creation system 21 waits until it obtains the next communication information for the monitored target 10, and then repeats step S41.
[0117] On the other hand, if it is determined in step S41 that it is an external WL communication link (yes in step S41), the analysis support graph creation system 21 performs a similar terminal extraction process to extract similar terminals for each of the terminals (sender terminal and destination terminal) of the external WL communication link (S42).
[0118] Next, the analysis support graph creation system 21 performs a process to extract past communication information that can be used for analyzing WL-external communications (S43, S44). Specifically, the analysis support graph creation system 21 performs a primary similar communication link extraction process to extract primary similar communication links, which are past communication links similar to WL-external communication links, using the similar terminals extracted in the similar terminal extraction process of step S42 (S43). Subsequently, the analysis support graph creation system 21 performs a secondary similar communication link extraction process to extract secondary similar communication links, which are past communication links that may be related to WL-external communication links, using the similar terminals extracted in the similar terminal extraction process of step S42 (S44). Note that primary similar communication links are not included in secondary similar communication links.
[0119] Next, the analysis support graph creation system 21 performs an analysis support graph creation process to create an analysis-oriented network graph using the past communication information extracted in steps S43 and S44 (S45). More specifically, the analysis support graph creation system 21 creates an analysis-oriented network graph using the primary similar communication links and secondary similar communication links extracted in steps S43 and S44.
[0120] In the following section, we will explain the details of steps S42 to S45 shown in Figure 5 using Figures 6 to 12.
[0121] Figure 6 is a flowchart showing an example of the similar terminal extraction process shown in Figure 5. Figure 6 illustrates the similar terminal extraction process performed by the similar terminal extraction unit 2103 of the analysis support graph creation system 21.
[0122] As shown in Figure 6, first, the analysis support graph creation system 21 acquires a machine learning model (S421). More specifically, the model acquisition unit 2104 shown in Figure 2A acquires the machine learning model stored in the model storage unit 2304.
[0123] Next, the analysis support graph creation system 21 acquires one WL external communication link (S422). More specifically, the similar terminal extraction unit 2103 acquires one generated WL external communication link. If there are multiple generated WL external communication links, the similar terminal extraction unit 2103 acquires them one by one in chronological order. In this case, the analysis support graph creation system 21 performs the following steps S422 to S424 for each of the generated WL external communication links.
[0124] Next, the analysis support graph creation system 21 calculates the similarity between each terminal of the WL-external communication link (source terminal and destination terminal) and each of the multiple terminals of the monitored target 10 from past communication information consisting of information showing past communications that have taken place at the monitored target 10 (S423). The analysis support graph creation system 21 can calculate the similarity between the terminals of the WL-external communication link and terminals other than those terminals by obtaining vector representations of the multiple terminals of the monitored target 10 using past communication information. In this embodiment, the similar terminal extraction unit 2103 calculates the similarity (source terminal similarity and destination terminal similarity) for each terminal of the WL-external communication link (source terminal and destination terminal) with other terminals observed so far, using the machine learning model acquired in step S421. This machine learning model obtains the vector representations of each of the multiple terminals of the monitored target 10 as described above.
[0125] Furthermore, the graph analysis technique disclosed in Non-Patent Document 2 can be used as a method for calculating terminal similarity. Specifically, a machine learning model generated by learning past communication information first calculates fixed-length embedding vectors as vector representations for each of the 10 monitored terminals. Next, the similarity is obtained by calculating the cosine similarity between the vector representations of the terminals.
[0126] Next, the analysis support graph creation system 21 extracts similar terminals (sender-like terminals and destination-like terminals) for each terminal (sender terminal and destination terminal) of the WL external communication link (S424). In this embodiment, the similar terminal extraction unit 2103 performs a threshold determination on the calculated similarity of each terminal and extracts terminals with a similarity above the threshold as either a sender-like terminal or a destination-like terminal. Here, if the similarity is calculated between 0 and 1, and a value closer to 1 indicates greater similarity, then a threshold of, for example, 0.8 can be set as a value close to 1. Note that a sender-like terminal is one or more similar terminals of the sender terminal and corresponds to the first similar terminal described above. Similarly, a destination-like terminal is one or more similar terminals of the destination terminal described above and corresponds to the second similar terminal described above.
[0127] Figure 7 is a flowchart showing an example of the primary similar communication link extraction process shown in Figure 5. Figure 7 shows the primary similar communication link extraction process performed by the primary similar communication link extraction unit 2107 of the analysis support graph creation system 21. Figures 8A to 8C show an example of the primary similar communication link extraction process according to Embodiment 1.
[0128] As shown in Figure 7, first, the primary similar communication link extraction unit 2107 obtains the primary similar communication link extraction conditions from the extraction condition storage unit 2111 (S431).
[0129] Next, the primary similar communication link extraction unit 2107 creates extraction conditions (primary similar communication link extraction conditions) using information about the WL-external communication link that is the target of the primary similar communication link extraction process (S432). For example, the primary similar communication link extraction unit 2107 creates extraction conditions (primary similar communication link extraction conditions) in accordance with the settings obtained in step S431, using information about the target WL-external communication link and similar terminals extracted in the similar terminal extraction process of step S42.
[0130] Here, let's assume that the source terminal of the target WL-external communication link is Y, the destination terminal is X, the destination port number is Z, and the protocol used is W. In this case, the conditions for extracting a communication link consisting of source terminal A, destination terminal B, destination port number C, and protocol used D as a primary similar communication link are as follows: (A=Y or A is a similar terminal to Y), (B=X or B is a similar terminal to X), (Z=C or Z is a port with a protocol similar to C), and (W=D or W is a protocol similar to D). Note that the WL-external communication link itself is excluded from the primary similar communication links.
[0131] Here, Figure 8A specifically shows an example of information regarding an external WL communication link that is subject to the primary similar communication link extraction process. Specifically, Figure 8A shows an external WL communication link where the source terminal (Src IP) is Y, the source port number (Src port) is 60000, the destination terminal (Dst IP) is X, the destination port number (Dst port) is 80, and the protocol used is http, etc.
[0132] Next, Figure 8B shows an example of extraction conditions (primary similar communication link extraction conditions) created using the information on WL external communication links shown in Figure 8A. More specifically, Figure 8B shows an example of extraction conditions (primary similar communication link extraction conditions) created according to the settings from the target WL external communication and similar terminals of the WL external communication link terminal extracted in the similar terminal extraction process in step S42. That is, Figure 8B shows extraction conditions where the source terminal is terminal Y or a similar terminal of terminal Y, the destination terminal is terminal X or a similar terminal of terminal X, the destination port number is 80, 8080 or 443, and the protocol used is http or https. Note that in Figure 8B, * (asterisk) indicates a wildcard, and it is indicated that there are no extraction conditions for source port number, method, target, and response.
[0133] Next, the primary similar communication link extraction unit 2107 extracts past communication information that matches the extraction conditions created in step S432 (S433). More specifically, the primary similar communication link extraction unit 2107 extracts past communications that match the extraction conditions created in step S432 from the past communication information held in the communication information DB 24. The communication links of the extracted past communications correspond to primary similar communication links.
[0134] Here, Figure 8C shows past communication information extracted using the extraction criteria in Figure 8B. More specifically, Figure 8C shows multiple past communications that match the extraction criteria in Figure 8B, i.e., primary similar communication links.
[0135] Note that the extraction conditions shown in Figure 8B (primary similar communication link extraction conditions) are just examples and are not limited to them. For example, in Figure 8B, the extraction conditions for the destination port number are given as 80, 8080, or 443, but they could also be 80, 8080, or just 80. Furthermore, there could be no extraction condition for the destination port number, and only an asterisk could be used. In addition, extraction conditions for the protocol used may be included, since the multigraph format of the communication link is represented by the destination terminal, source terminal, and protocol, but they are not required.
[0136] In this way, the primary similar communication link extraction process extracts primary similar communication links that are similar to the WL-external communication links from among past communications included in the past communication information.
[0137] Figure 9 is a flowchart showing an example of the secondary similar communication link extraction process shown in Figure 5. Figure 9 shows the secondary similar communication link extraction process performed by the secondary similar communication link extraction unit 2106 of the analysis support graph creation system 21. Figures 10A to 10C show an example of the secondary similar communication link extraction process according to Embodiment 1.
[0138] As shown in Figure 9, first, the secondary similar communication link extraction unit 2106 obtains the settings for the secondary similar communication link extraction conditions from the extraction condition storage unit 2111 (S441).
[0139] Next, the secondary similar communication link extraction unit 2106 extracts past communication information of the source terminal or destination terminal, which is the terminal of the WL external communication link, from the communication information DB 24 (S442).
[0140] Here, Figure 10A specifically shows an example of past communication information for the source terminal and destination terminal, which are terminals of an external WL communication link that are subject to the secondary similar communication link extraction process. Specifically, Figure 10A shows information about past communications conducted by terminals of an external WL communication link based on past communication information. More specifically, Figure 10A shows past communication information when source terminal Y, which is a terminal of an external WL communication link, communicated with its counterpart terminal, terminal B or terminal C, as either the source or destination in the past. Similarly, it shows past communication information when destination terminal X, which is a terminal of an external WL communication link, communicated with its counterpart terminal, terminal A or terminal D, as either the source or destination in the past.
[0141] Next, the secondary similar communication link extraction unit 2106 extracts the counterpart terminals (source and destination terminals) of the WL-external communication link from the past communication information extracted in step S442, and extracts similar terminals to the extracted counterpart terminals (S443). In this embodiment, the secondary similar communication link extraction unit 2106 extracts similar terminals to the extracted counterpart terminals using a machine learning model.
[0142] Next, the secondary similar communication link extraction unit 2106 creates extraction conditions (secondary similar communication link extraction conditions) from the past communication information extracted in step S442, the similar terminals of the partner terminal extracted in step S443, and the similar terminals of the terminals of the WL-external communication link extracted in the similar terminal extraction process (S444). Here, the secondary similar communication link extraction unit 2106 creates these extraction conditions (secondary similar communication link extraction conditions) according to the settings acquired in step S441.
[0143] Here, let's assume that the source terminal of the target WL external communication link is Y and the destination terminal is X. In this case, an extraction condition is set so that past unique communication links where X was the source or destination, or past unique communication links where Y was the source or destination, are extracted as secondary similar communication links. Let's assume that the communication link consists of a combination of elements such as the source terminal's IP address, the destination terminal's IP address, the destination port number, and the protocol used. In this case, the extraction condition is set for each element in the combination of the terminal of the target WL external communication link, its counterpart terminal, and the protocol used.
[0144] For example, in the case of a combination where the source terminal is Y, the destination terminal is Z1, the destination port number is W1, and the protocol used is V1, the conditions for extracting a communication link consisting of source terminal A, destination terminal B, destination port number C, and protocol used D as a secondary similar communication link are as follows: (A is a similar terminal to Y), (B=Z1 or B is a similar terminal to Z1), (C=W1 or C is a port with a protocol similar to W1), and (D=V1 or D is a protocol similar to V1).
[0145] Furthermore, for example, in the case of a combination where the source terminal is Z2, the destination terminal is X, the destination port number is W2, and the protocol used is V2, the conditions for extracting a communication link consisting of source terminal A, destination terminal B, destination port number C, and protocol used D as a secondary similar communication link are as follows: (A=Z2 or A is a similar terminal to Z2), (B is a similar terminal to X), (C=W2 or C uses a protocol similar to W2), and (D=V2 or D uses a protocol similar to V2).
[0146] In other words, extraction conditions are set for extracting secondary similar communication links based on the necessary elements of each element that constitutes a past unique communication link that terminals X and Y, which are the target WL external communication links, have made as either the source or destination. Note that WL external communication links and primary similar communication links are excluded from the secondary similar communication links.
[0147] Figure 10B shows an example of extraction conditions (secondary similar communication link extraction conditions) created using information about past communications by the terminal of the WL-external communication link shown in Figure 10A. More specifically, Figure 10B shows an example of extraction conditions (secondary similar communication link extraction conditions) created according to the settings, using past communication information, similar terminals of the WL-external communication link terminal, and similar terminals of the counterpart terminal of the WL-external communication link terminal.
[0148] The first row (no.1) of the extraction criteria shown in Figure 10B represents the extraction criteria for the time stamp of ts1 shown in Figure 10A. In (no.1), the source terminal indicated by SrcIP is a terminal similar to terminal X, the destination terminal indicated by DstIP is terminal A or a terminal similar to terminal A, the destination port number indicated by DstPort is 445, and the protocol used indicated by Protocol is smb. Note that (no.1) does not indicate any extraction criteria for method, target, or response.
[0149] Furthermore, the second row (no. 2) of the extraction conditions shown in Figure 10B represents the extraction conditions for the time stamp of ts2 shown in Figure 10A. In (no. 2), the source terminal indicated by SrcIP is a terminal similar to terminal Y, the destination terminal indicated by DstIP is terminal B or a terminal similar to terminal B, the destination port number indicated by DstPort is 21, and the protocol used indicated by Protocol is ftp. Note that (no. 2) also shows that there are no extraction conditions for method, target, or response.
[0150] Furthermore, the third row (no. 3) of the extraction conditions shown in Figure 10B represents the extraction conditions for the time stamp of ts3 shown in Figure 10A. In (no. 3), the source terminal indicated by SrcIP is terminal C or a similar terminal of terminal C, the destination terminal indicated by DstIP is a similar terminal of terminal Y, the destination port number indicated by DstPort is 135, and the protocol used indicated by Protocol is dce-rpc. Note that (no. 3) also shows that there are no extraction conditions for method, target, or response.
[0151] Furthermore, the fourth row (no. 4) of the extraction conditions shown in Figure 10B shows the extraction conditions for the time stamp of ts4 shown in Figure 10A. In (no. 4), the source terminal indicated by SrcIP is terminal D or a similar terminal of terminal D, the destination terminal indicated by DstIP is a similar terminal of terminal X, the destination port number indicated by DstPort is 502, and the protocol used indicated by Protocol is modbus. Note that (no. 4) also shows that there are no extraction conditions for method, target, or response.
[0152] Thus, the extraction criteria shown in Figure 10B are generated for each time stamp column of the past communication information shown in Figure 10A.
[0153] Next, we return to Figure 9.
[0154] Next, the secondary similar communication link extraction unit 2106 extracts past communication information that matches the extraction conditions created in step S444 (S445). More specifically, the secondary similar communication link extraction unit 2106 extracts past communications that match the extraction conditions created in step S444 from the past communication information held in the communication information DB 24. The communication links of the extracted past communications correspond to secondary similar communication links.
[0155] Here, Figure 10C shows past communication information extracted using the extraction criteria in Figure 10B. More specifically, Figure 10C shows multiple past communications that match the extraction criteria in Figure 10B, i.e., secondary similar communication links.
[0156] Note that the extraction conditions (secondary similar communication link extraction conditions) shown in Figure 10B are just examples and are not limited to them. For example, in Figure 10B, there may be no extraction condition for the destination port number, and it may be represented by an asterisk. Also, there may or may not be an extraction condition for the protocol used.
[0157] In this way, the secondary similar communication link extraction process extracts past communications from terminals (destination terminal and source terminal) of the WL-external communication link, and extracts communications similar to the extracted communications from among the past communications included in the past communication information. The extracted similar communications are past communications that may be related to the WL-external communication link, and the communication link of such similar communications corresponds to a secondary similar communication link.
[0158] Figure 11 is a flowchart showing an example of the analysis support graph creation process shown in Figure 5. Figure 11 shows the analysis support graph creation process performed by the NW graph creation unit 2109 of the analysis support graph creation system 21.
[0159] As shown in Figure 11, first, the NW graph creation unit 2109 obtains information on WL-external communication links, similar terminals, primary similar communication links, and secondary similar communication links from the primary similar communication link extraction unit 2107 and the secondary similar communication link extraction unit 2106 (S451).
[0160] Next, the NW graph creation unit 2109 creates graph information to display the terminals (sender terminal and destination terminal) of WL-external communication and the WL-external communication link on the graph (screen) in a way that highlights them so that they can be distinguished from other terminals and communication links other than the WL-external communication link (S452).
[0161] Next, the NW graph creation unit 2109 updates the graph information to display the primary similar communication link, secondary similar communication link, and the terminals (sender terminal, destination terminal) of the primary similar communication link and secondary similar communication link on the graph (S453).
[0162] Next, the NW graph creation unit 2109 updates the graph information to group and display each terminal and its similar terminals so that they can be identified on the graph (S454). More specifically, the NW graph creation unit 2109 updates the graph information to group similar terminals belonging to the same group and different groups so that the user can identify them. Here, the NW graph creation unit 2109 groups terminals (destination terminals and source terminals) in the WL external communication link with multiple similar terminals of each of those terminals.
[0163] Next, the NW graph creation unit 2109 creates a message based on the graph information updated in step S454 as auxiliary information for the analyst to use for analysis (S455). In this embodiment, the message is created in accordance with the content of the NW graph shown by the graph information updated in step S454.
[0164] Next, the NW graph creation unit 2109 displays the NW graph based on the graph information updated in step S454 and the message created in step S455 on the screen (S456). In this embodiment, the NW graph creation unit 2109 displays the NW graph based on the graph information and the created message on the screen of the NW graph display unit 2110.
[0165] In this way, the NW graph creation unit 2109 can create graph information from the information acquired in step S451 for the analyst to analyze communications outside the WL, and display it on the screen.
[0166] Figure 12 is a flowchart showing an example of the detailed processing of step S455 shown in Figure 11. Figure 12 shows the message creation process performed by the NW graph creation unit 2109 of the analysis support graph creation system 21.
[0167] As shown in Figure 12, first the NW graph creation unit 2109 determines whether a primary similar communication link exists (S4551).
[0168] If it is determined in step S4551 that a primary similar communication link exists (yes in S4551), the NW graph creation unit 2109 creates a message prompting the analyst to check for past communications (primary similar communication links) that are similar to the WL communication (S4552).
[0169] On the other hand, if it is determined in step S4551 that there is no primary similar communication link (no in S4551), the NW graph creation unit 2109 determines whether there is a secondary similar communication link (S4553).
[0170] If it is determined in step S4553 that a secondary similar communication link exists (yes in S4553), the NW graph creation unit 2109 creates a message indicating that there is no primary similar communication link but a secondary similar communication link exists (S4554). The message created here does not have to be a message that only a secondary similar communication link exists, that is, a message that directly states that there is no primary similar communication link but a secondary similar communication link exists. The message created should be one that warns that there is a possibility that the WL external communication link is abnormal and prompts the user to be sure to check the WL external communication link.
[0171] On the other hand, if no secondary similar communication links exist in step S4553 (no in S4553), the NW graph creation unit 2109 generates a message indicating that neither primary nor secondary similar communication links exist (S4555).
[0172] [1.3.3 Example screen for analyzing WL external communications] Next, we will explain an example of a screen displaying graph information that allows analysts to check for abnormalities or legitimacy (normality) of the WL external communication link.
[0173] Figures 13A to 13C show examples of screen information displayed on the screen when a primary similar communication link exists according to Embodiment 1. As shown in Figures 13A to 13C, the screen information is an example of graph information created by the analysis support graph creation system 21 when a primary similar communication link exists, and consists of a message, a similar terminal / communication graph, and new link information. A new link means an external WL communication link, and new link information means information about the external WL communication link.
[0174] The screen information created by the NW graph creation unit 2109 is presented to the analyst by the NW graph display unit 2110. The NW graph display unit 2110 also receives screen operation input from the analyst and performs various processing.
[0175] In the similar terminal / communication graph shown in Figure 13A, terminals X and Y, i.e., the terminals of the target WL-external communication (sender terminal and destination terminal), and the communication link connecting terminals X and Y, i.e., the WL-external communication link, are highlighted. Furthermore, in the similar terminal / communication graph, groups of similar terminals are displayed in a way that allows for identification of terminal X and its similar terminals, terminal Y and its similar terminals, and terminal A and its similar terminals.
[0176] Furthermore, the message shown in Figure 13A indicates that a primary similar communication link, which is an external WL communication link, has occurred in the past, and prompts the user to confirm whether the new link is problem-free. This message is an example of prompting the user to confirm a primary similar communication link that is similar to an external WL communication link.
[0177] The message shown in Figure 13A further indicates that selecting any link will highlight similar links. In this way, messages containing instructions for guiding the user may also be included.
[0178] Furthermore, the new link information shown in Figure 13A displays information about the new link (WL external communication link) connecting terminals X and Y, which are highlighted on the similar terminal / communication graph. In the example shown in Figure 13A, the new link information displays the time the new link was created, as well as information about the source terminal and destination terminal.
[0179] Next, let's assume that, following the guidance of the message shown in Figure 13A, the analyst selected the new link (WL-external communication link) highlighted in Figure 13A.
[0180] In this case, as shown in Figure 13B, the communication link between terminal X' and terminal Y, and the communication link between terminal X'' and terminal Y' are highlighted and displayed as primary similar communication links. Also, as shown in Figure 13B, the new link information shown in Figure 13A is displayed as the selected new link information.
[0181] In this case, the message shown in Figure 13B may also include instructions on how to guide the user, such as a message indicating that selecting a similar link will display comparison information.
[0182] Next, following the guidance of the message shown in Figure 13B, suppose the analyst selects the communication link between terminals X'' and Y', which is one of the primary similar communication links highlighted in Figure 13B. In this case, as shown in Figure 13C, comparison information is displayed, namely the new link (WL-external communication link) and the comparison target information, which is the information of the selected primary similar communication link.
[0183] Here, the comparison information, which is the information of the selected primary similar communication link, is displayed using the past communication information extracted in the primary similar communication link extraction process described above. If there is a large amount of extracted past communication information, it is best to prioritize displaying the items (elements) that the analyst should look at from the past communication information.
[0184] Items that should be prioritized for display include, for example, items that allow confirmation of the degree of agreement between past communication information and the communication content of the WL external communication link. For example, if the protocol of both the past communication information and the WL external communication link is HTTP, the degree of agreement can be confirmed by items such as the method used, the file path being operated on, and the response code to the request. Then, the past communication information should be displayed in order from the items that match the WL external communication link the most among the items that allow confirmation of the degree of agreement in communication content.
[0185] Another item that can be prioritized for display is the timestamp. In this case, the numerous past communication records may be displayed in order from those with the most recent timestamp.
[0186] Therefore, by comparing or verifying the information displayed on the screen as shown in Figure 13C, the analyst can confirm the communication content of the new link and that similar communication content has occurred in the past on similar terminals. If the analyst can confirm that the communication content of the new link and similar communication content have occurred in the past on similar terminals, they can determine that the abnormality of the new link is low.
[0187] Figures 14A to 14C show examples of screen information displayed on the screen when a primary similar communication link does not exist but a secondary similar communication link exists, according to Embodiment 1. Elements similar to those in Figures 13A to 13C are given the same names, and detailed explanations are omitted.
[0188] The similar terminal and communication graphs shown in Figures 14A to 14C differ from those shown in Figures 13A to 13C in that there is no communication link (primary similar communication link) similar to the new link (WL external communication link) connecting terminals X and Y.
[0189] Therefore, in the similar terminal / communication graphs shown in Figures 14A to 14C, only new links and secondary similar communication links exist. For this reason, the message content shown in Figure 14A differs from the message content shown in Figure 13A.
[0190] The message shown in Figure 14A indicates that no primary similar communication links have occurred in the past, and that secondary similar communication links have occurred in the past, given that the new link is an external WL communication link. The message content shown in Figure 14A is just an example; any message that warns the user to carefully check the external WL communication link, as it may be abnormal, is acceptable. The message in Figure 14A also indicates that similar links will be highlighted when any link is selected. In this way, messages regarding operating procedures to guide the user may also be included.
[0191] Next, suppose the analyst, following the guidance of the message shown in Figure 14A, selects the communication link between terminal Y' and terminal A', which is one of the secondary similar communication links in Figure 14A. In this case, as shown in Figure 14B, the communication link between terminal Y' and terminal A' and the communication link between terminal Y and terminal A are highlighted and displayed as secondary similar communication links.
[0192] Furthermore, as shown in Figure 14B, information about the secondary similar communication link, which is the communication link between terminals Y' and A' selected in Figure 14A, is displayed as the selected link information.
[0193] Next, let's assume that, following the guidance of the message shown in Figure 14B, the analyst selects the communication link between terminal Y and terminal A, which is one of the secondary similar communication links highlighted in Figure 14B. In this case, as shown in Figure 14C, comparison information is displayed, namely the information of the communication link between terminal Y' and terminal A' (selected link information) and the information of the communication link between terminal Y and terminal A (comparison information).
[0194] Therefore, by comparing or verifying the contents of the secondary similar communication link displayed on the screen information as shown in Figure 14C with the contents of the new link (WL-external communication link), the analyst can determine whether it is a normal communication based on the role of the terminal in the new link (WL-external communication link). However, if it is not possible to determine whether the WL-external communication link is normal or abnormal based solely on the comparison and verification of the contents displayed on the screen information as shown in Figure 14C, the analyst should conduct a separate, more detailed analysis.
[0195] Figure 15A shows an example of screen information displayed on the screen when there are no primary or secondary similar communication links according to Embodiment 1. Elements similar to those in Figures 13A to 13C are given the same names, and detailed explanations are omitted.
[0196] The similar terminal and communication graph shown in Figure 15A differs from the similar terminal and communication graphs shown in Figures 13A to 13C in that neither the primary nor the secondary similar communication link exists for the new link (WL-external communication link) connecting terminals X and Y. Therefore, the content of the message shown in Figure 15A differs from the content of the messages shown in Figures 13A and 14A.
[0197] The message shown in Figure 15A indicates that no primary or secondary similar communication links have occurred in the past for the new link, which is an external WL communication link. The message also prompts the user to switch graphs to check surrounding information. Thus, messages containing instructions for guiding the user may also be included.
[0198] Figure 15B is a diagram illustrating an example of switching screen information according to Embodiment 1. Figure 15B is an example of graph information created by the analysis support graph creation system 21. Figure 15B(a) shows the similar terminal / communication graph among the screen information, and Figure 15B(b) shows the communication status graph among the screen information. Switching between the similar terminal / communication graph and the communication status graph is performed by pressing the graph switching button. Here, the communication status graph is a graph used in conventional IDS products, and it displays all terminals that the source terminal and destination terminal of the communication that caused the alert have communicated with in the past.
[0199] Furthermore, the similar devices / communication graph and the communication status graph have different purposes. By switching between the similar devices / communication graph and the communication status graph, analysts can use both graphs to understand the alerts that have occurred.
[0200] [1.3.4 Example of a screen after analyzing WL external communication] Next, we will explain examples of screens that appear after analyzing WL-external communication using the screen information shown in Figures 13A to 15B.
[0201] Figure 16 is a diagram showing an example of screen information displayed on the screen when adding an external WL communication to the WL according to Embodiment 1. The screen information shown in Figure 16 is an example of graph information created by the analysis support graph creation system 21. In the example shown in Figure 16, the NW graph display unit 2110 displays a message such as "Add the incoming external WL communication to the WL?" and an operation button for deciding whether or not to add the external WL communication to the WL.
[0202] In this embodiment, the analyst uses the screen information shown in Figures 13A to 15B to analyze the WL-external communication. If the analyst determines that the WL-external communication was normal, they select the WL-external communication link, and an operation window is displayed. The analyst then simply presses, for example, the "Yes" button in the displayed operation window. As a result, the WL change operation unit 2108 adds the WL-external communication to the whitelist 22.
[0203] In this way, analysts can easily add WL-external communications to the whitelist 22. Therefore, even if such WL-external communication links occur, they will not be falsely detected, and the occurrence of false detections can be suppressed.
[0204] [1.3.5 Example screen for adjusting the display range of similar devices and communication graphs] Figure 17 shows an example of some of the screen information displayed on the screen when adjusting the display range of the similar terminal / communication graph according to Embodiment 1. The similar terminal / communication graph shown in Figure 17 is part of the screen information and is an example of graph information created by the analysis support graph creation system 21. The similar terminal / communication graph shown in Figure 17 also shows a slider bar used by the NW graph display unit 2110 to adjust the threshold for the similarity between terminals (destination terminals and source terminals) of the WL external communication link and their similar terminals. The number of similar terminals shown in the NW graph is reconfigured and displayed on the screen according to the threshold changed by the analyst's operation on the slider bar. By default, the position of the slider bar is adjusted so that the number of similar terminals is, for example, about 10.
[0205] For example, in the example shown in Figure 17, the slider bar corresponds to the display rate of similar terminals, and the default position of the slider bar is in the middle (display rate 50). As shown in Figure 17(a), if the slider bar is moved to the right and the display rate is increased to 80, the similarity threshold is lowered, and similar terminals with low similarity to terminals on the WL-external communication link will also be displayed in the similar terminals / communication graph. On the other hand, as shown in Figure 17(b), if the slider bar is moved to the left and the display rate is increased to 10, the similarity threshold is raised, and only similar terminals with high similarity to terminals on the WL-external communication link will be displayed in the similar terminals / communication graph.
[0206] As shown in Figure 17, the display range of similar terminals to terminals in the WL-external communication link can be adjusted by manipulating the slider bar, which may make it easier to verify the legitimacy of WL-external communication.
[0207] [1.4 Effects, etc.] Whitelists, commonly used for detecting cyberattacks, often produce false positives due to insufficient training periods. If a whitelist is put into operation before sufficient training is achieved, and then normal out-of-wall (WL) communications occur, analysts may have to manually analyze these communications and add them to the whitelist. This can burden analysts and potentially prevent them from properly addressing genuine cyberattacks. While methods using machine learning to prioritize out-of-wall communications have been proposed, the anomaly of these prioritized communications may not be intuitively clear to analysts, potentially leading to inadequate responses.
[0208] In contrast, the communication analysis system according to Embodiment 1 described above makes it possible to easily analyze abnormalities in WL (Walkway Network) communication.
[0209] More specifically, the communication analysis system according to Embodiment 1 can create an analytical network graph as graph information for analyzing a newly generated WL-external communication link, using the primary similar communication link and past communication information. It is also possible to create an analytical network graph as graph information for analyzing a newly generated WL-external communication link, using the primary and secondary similar communication links and past communication information. Therefore, the analytical network graph extracts and displays the information that needs to be checked regarding the newly generated WL-external communication. This allows the user, acting as the analyst, to intuitively grasp the abnormality of the WL-external communication and easily verify its legitimacy.
[0210] Here, the NW graph for analysis shows information on whether similar communication is occurring at terminals similar to the terminal on the WL (Wireless Network) communication link. Similar terminals, as mentioned above, are terminals whose communication trends are similar to those of the terminal on the WL communication link, or terminals whose roles are similar to those of the terminal on the WL communication link. A legitimate WL communication means that it is the type of communication that should occur under normal operation. On the other hand, an abnormal WL communication means that it is the type of communication that should not occur under normal operation.
[0211] Thus, the communication analysis system according to Embodiment 1 can provide analysts with information to assist in the alert analysis of newly occurring WL-external communications, thereby enabling efficient handling of alert analysis for WL-external communication links.
[0212] Furthermore, the communication analysis system according to Embodiment 1 may create and display messages as auxiliary information for analysis by the analyst. This allows the user to more easily perform analysis by using the messages to understand the operating procedures, display comparative information, and determine whether there are primary similar communications similar to the WL communication, and to display information that should be compared and confirmed.
[0213] (Embodiment 2) Embodiment 1 described a communication analysis system 20 used during the operation of the whitelist 22 after the learning process is complete. Embodiment 2 describes a communication analysis system 20A used before the operation of the whitelist 22 after the learning process is complete.
[0214] When implementing Whitelist 22 in the field, the learning period required to adequately train Whitelist 22 varies from site to site and is unpredictable. Furthermore, the start of implementation may be rushed, making it impossible to secure sufficient learning time. If Whitelist 22 is used without sufficient training, not only will normal communication links be mistakenly identified as WL-external communication links, but a large number of such misidentifications may occur. Additionally, manually adding information on normal communication links from equipment data, etc., to Whitelist 22 before implementation is difficult because there are countless communication links that did not occur during the training period.
[0215] In contrast, we found that by adding previously unaddressed communication links—those that were not affected during the learning period but are highly likely to be normal—to the whitelist before implementing the whitelist, we could efficiently operate the whitelist and suppress false positives that occur after implementation. Furthermore, we found that by efficiently operating the whitelist in this way, the legitimacy of WL-external communications that occur during operation can be easily verified, and the response to alerts can be made more efficient.
[0216] Therefore, we conceived of a communication analysis system 20A that can provide supplementary graph information for analysts to analyze whether any of the countless communication links that did not occur during the learning period have the potential to become normal communication links. This system is described below.
[0217] The following description will focus on the differences between the communication analysis system 20A according to Embodiment 2 and Embodiment 1, with reference to the drawings.
[0218] [2 Overall Structure] Figure 18 shows the overall configuration including the communication analysis system 20A according to Embodiment 2. The same reference numerals are used for elements similar to those in Figure 1, and detailed explanations are omitted.
[0219] The overall configuration shown in Figure 18 differs from that shown in Figure 1 in the configuration of the communication analysis system 20A. The other parts are as described in Embodiment 1, so their explanation will be omitted.
[0220] [2.1 Communication Analysis System 20A] The communication analysis system 20A analyzes the network communications of multiple terminals in a predetermined environment, which is the target of monitoring 10.
[0221] Before putting the learned whitelist 22 into operation, the communication analysis system 20A creates and presents graph information to assist in the process of adding communication links that are predicted to occur normally in the future to the whitelist 22 from among the countless communication links that have not occurred during the learning period.
[0222] The communication analysis system 20A according to this embodiment, as shown in Figure 18, comprises a communication link prediction graph creation system 25, a whitelist 22, a communication link learning system 23, and a communication information DB 24. The communication analysis system 20A shown in Figure 18 differs from the communication analysis system 20 shown in Figure 1 in that it lacks the analysis support graph creation system 21 and includes the communication link prediction graph creation system 25. The other parts are as described in Embodiment 1, so their description will be omitted.
[0223] [2.1.1 Communication Link Prediction Graph Creation System 25] The communication link prediction graph creation system 25 creates graph information to determine whether or not to add communication links that are not included in the whitelist 22 and have not occurred in past communication information (unoccurred communication links) to the whitelist 22. In this embodiment, the communication link prediction graph creation system 25 uses a machine learning model trained by the communication link learning system 23 and past communication information held by the communication information DB 24 to present communication links that are not included in the whitelist 22 but may occur in the future.
[0224] Figure 19A is a block diagram showing an example of the configuration of the communication link prediction graph creation system 25 according to Embodiment 2. Figure 19B is a block diagram showing the minimum configuration of the communication link prediction graph creation system 25 shown in Figure 19A.
[0225] As shown in Figure 19A, the communication link prediction graph creation system 25 includes an information acquisition unit 2501, a prediction target link extraction unit 2502, a model acquisition unit 2503, a link confidence calculation unit 2504, a WL automatic addition determination unit 2505, an NW graph creation unit 2506, an NW graph display unit 2507, and a WL change operation unit 2508. As shown in Figure 19B, the communication link prediction graph creation system 25a only needs to include an information acquisition unit 2501, a prediction target link extraction unit 2502, a model acquisition unit 2503, a link confidence calculation unit 2504, and an NW graph creation unit 2506 as its minimum configuration.
[0226] The communication link prediction graph generation system 25 includes, for example, a computer including memory and a processor (microprocessor), and the processor executes a predetermined program stored in memory to realize the functions of each component.
[0227] The information acquisition unit 2501 acquires past communication information from the communication information DB 24. The information acquisition unit 2501 is, for example, a communication interface. As mentioned above, the past communication information includes communication information such as what terminals (destination terminal and source terminal) each terminal in the monitored device 10 has communicated with in the past, and what protocols it has used.
[0228] The model acquisition unit 2503 acquires a machine learning model from the model storage unit 2304 of the communication link learning system 23. In this embodiment, the model acquisition unit 2503 acquires a machine learning model generated by learning the communications performed on the monitored object 10. As mentioned above, the machine learning model is learned in the communication link learning system 23.
[0229] The prediction target link extraction unit 2502 may include, for example, a computer including memory and a processor (microprocessor), and the following extraction functions may be realized by the processor executing a predetermined program stored in memory.
[0230] The prediction target link extraction unit 2502 extracts unoccurred communication links, which are communication links that have not occurred in the past communication information and are subject to prediction, based on the past communication information acquired by the information acquisition unit 2501. Here, the prediction target link extraction unit 2502 may, for example, use a trained machine learning model to calculate the similarity between multiple terminals and use the calculated similarity and past communication information to extract unoccurred communication links.
[0231] In this way, the prediction target link extraction unit 2502 extracts unoccurred communication links that are subject to prediction based on past communication information acquired by the information acquisition unit 2501.
[0232] The prediction target link extraction unit 2502 may extract similar terminals using predetermined rules. Alternatively, the prediction target link extraction unit 2502 may extract unoccurred communication links to be predicted using predetermined rules.
[0233] The link confidence calculation unit 2504 may include, for example, a computer including memory and a processor (microprocessor), and the following calculation functions may be realized by the processor executing a predetermined program stored in memory.
[0234] The link confidence calculation unit 2504 calculates a confidence level indicating the likelihood that the unoccurred communication links extracted by the prediction target link extraction unit will occur as normal communication links in the future. Here, confidence can also be rephrased as the probability that the unoccurred communication links will occur in the future, i.e., the occurrence probability. The link confidence calculation unit 2504 may also calculate the confidence level of the unoccurred communication links extracted by the prediction target link extraction unit 2502 using, for example, a trained machine learning model. Furthermore, the link confidence calculation unit 2504 may also calculate the confidence level of the unoccurred communication links extracted by the prediction target link extraction unit 2502 using predetermined rules.
[0235] The WL automatic addition determination unit 2505 achieves the following determination functions by having the processor execute a predetermined program stored in memory.
[0236] The WL automatic addition determination unit 2505 determines whether to add communications of unconducted communication links to the whitelist 22 by determining whether the confidence level calculated by the link confidence level calculation unit 2504 is equal to or greater than a threshold. For example, if the WL automatic addition determination unit 2505 determines that the calculated confidence level is equal to or greater than a threshold, it instructs the WL change operation unit 2508 to add communications of unconducted communication links with a confidence level equal to or greater than the threshold to the whitelist 22. In this way, the WL automatic addition determination unit 2505 can cause the WL change operation unit 2508 to add communications of unconducted communication links with a confidence level equal to or greater than the threshold to the whitelist 22.
[0237] Thus, in this embodiment, the WL automatic addition determination unit 2505 can automatically add unactivated communication links to the whitelist 22 if the confidence level of the unactivated communication link calculated by the link confidence level calculation unit 2504 is equal to or greater than a threshold. The WL automatic addition determination unit 2505 may operate only when the automatic addition setting to the whitelist 22 is ON.
[0238] The NW graph creation unit 2506 includes, for example, memory and a processor (microprocessor), and the following creation functions are realized by the processor executing a predetermined program stored in memory.
[0239] The NW graph creation unit 2506 creates graph information to assist in the process of adding communication links that are predicted to occur normally in the future from among the communication links that have not occurred during the learning period to the whitelist 22. More specifically, the NW graph creation unit 2506 uses the unoccupied communication links extracted by the prediction target link extraction unit 2502, the confidence level calculated by the link confidence level calculation unit 2504, and the past communication information acquired by the information acquisition unit 2501 to create graph information for determining whether or not to add a link to the whitelist 22. The graph information created here includes, for example, an NW graph that maps unoccupied communication links and information related to unoccupied communication links.
[0240] In this embodiment, the NW graph creation unit 2506 creates an NW graph in which previously occurring communication links are mapped to unoccurred communication links, and further maps confidence level information and a list of unoccurred communication links to the created NW graph to create graph information. In addition, the NW graph creation unit 2506 may create information related to the NW graph in response to user operations and display it on the NW graph display unit 2507.
[0241] The NW graph display unit 2507 displays the NW graph created by the NW graph creation unit 2506 on the screen. The NW graph display unit 2507 may be, for example, a display or a touch panel, and various processes may be performed by operations on a predetermined position on the touch panel or by operations on a predetermined position on the display via an input device such as a mouse.
[0242] In this embodiment, the NW graph display unit 2507 presents the user, who is the analyst, with information related to the NW graph, such as a list of unactivated communication links included in the NW graph, in addition to the NW graph created by the NW graph creation unit 2506. Furthermore, the NW graph display unit 2507 may also present the user with information related to the NW graph created in response to user operations.
[0243] For example, when the NW graph display unit 2507 displays a list of unactivated communication links included in the NW graph on the screen, it may display a checkbox on the screen for each unactivated communication link in the list, allowing the user to select whether or not to add it to the whitelist 22. Furthermore, the NW graph display unit 2507 may display a button on the screen to add the unactivated communication links selected by the checkboxes from the list to the whitelist 22. This allows the user to add unactivated communication links to the whitelist 22 with simple operation.
[0244] Furthermore, for example, the NW graph display unit 2507 may display a slider bar on the screen for adjusting the confidence threshold of unactivated communication links. In this case, the NW graph display unit 2507 can reconfigure and display the unactivated communication links and related information on the NW graph on the screen according to the threshold changed by the user's operation on the slider bar. By operating on such a slider bar, the unactivated communication links and related information (display range) can be adjusted. This makes it easy for the user to determine whether an unactivated communication link is likely to become a normal communication link in the future, and allows them to add unactivated communication links that are likely to become normal communication links in the future to the whitelist 22.
[0245] Furthermore, the NW graph display unit 2507 may display comparison information on the screen to determine whether or not to add the selected unactivated communication link to the whitelist 22 when the user selects one of the unactivated communication links in the NW graph displayed on the screen. Here, the comparison information consists of detailed information of the selected unactivated communication link and detailed information of past communications included in the past communication information that are similar to the selected unactivated communication link, and is displayed side by side on the screen.
[0246] By displaying such comparative information, it becomes easy to determine whether an unoccurred communication link is likely to become a normal communication link in the future. The NW graph display unit 2507 may also display a message regarding the procedure for displaying the comparative information, guiding the user to display the comparative information on the screen. This allows the user to display the comparative information by referring to the procedure.
[0247] The WL change operation unit 2508 adds communications of unconducted communication links to the whitelist 22 in response to user instructions. In this embodiment, the WL change operation unit 2508 adds communications of unconducted communication links with a confidence level above a threshold to the whitelist 22 in response to instructions from the NW graph display unit 2507 or the WL automatic addition determination unit 2505.
[0248] [2.2 Operation of Communication Analysis System 20A] Next, we will explain the operation of the communication analysis system 20A, which is configured as described above.
[0249] [2.2.1 Operation of the Analysis Method] Figure 20 is a flowchart showing the operation of the analysis method of the communication analysis system 20A according to Embodiment 2. Figure 20 shows the communication link prediction graph creation process using the minimum configuration of the communication link prediction graph creation system 25 shown in Figure 19B as an example of the operation of the analysis method.
[0250] First, the communication link prediction graph creation system 25 of the communication analysis system 20A acquires past communication information from the communication information DB 24 (S21). More specifically, the communication link prediction graph creation system 25 acquires past communication information from the communication information DB 24, which holds past communication information consisting of information indicating past communications that have taken place in the monitored target 10.
[0251] Next, the communication link prediction graph creation system 25 extracts unoccurred communication links based on the past communication information obtained in step S21 (S22). Here, unoccurred communication links are communication links that are not included in the whitelist 22 and have not occurred in the past communication information, and are one or more communication links that are subject to prediction.
[0252] Next, the communication link prediction graph generation system 25 calculates the confidence level indicating the likelihood that the unoccurred communication links extracted in step S22 will occur as normal communication links in the future (S23).
[0253] Next, the communication link prediction graph creation system 25 creates an NW graph by mapping the ungenerated communication links extracted in step S22, the confidence levels calculated in step S23, and the acquired past communication information to the ungenerated communication links and information related to the ungenerated communication links (S24).
[0254] By performing such an analysis method, before applying the learned whitelist 22, it is possible to create graph information for assisting the process of adding communication links predicted to occur normally in the future to the whitelist 22 from among the communication links that did not occur during the learning period.
[0255] [2.2.2 Detailed Operations of the Analysis Method] Subsequently, as the detailed operations of the analysis method, the communication link prediction graph creation process by the communication analysis system 20A shown in FIG. 19A will be described.
[0256] FIG. 21 is a flowchart showing the overall process of the communication link prediction graph creation system 25 according to Embodiment 2.
[0257] First, the communication link prediction graph creation system 25 determines whether the learning period of the WL, that is, the learning period of the whitelist 22, has ended (S51). Here, the learning period is, for example, the period during which the past communication information used to create the whitelist 22 was collected, and is the period during which the past communications performed in the monitoring target 10 included in the past communication information were collected.
[0258] If the learning period of the WL has not ended in step S51 (no in S51), the communication link prediction graph creation system 25 ends the process.
[0259] On the other hand, if the learning period of the WL has ended in step S51 (yes in S51), the communication link prediction graph creation system 25 extracts all the communication links observed during the learning period (S52). In the present embodiment, the communication link prediction graph creation system 25 extracts all the communication links from the past communication information.
[0260] Next, the communication link prediction graph creation system 25 performs an unoccurred communication link selection process to select unoccurred communication links to be predicted using all communication links extracted in step S52 (S53).
[0261] Next, the communication link prediction graph creation system 25 performs a confidence calculation process to calculate the confidence level of the unoccupied communication links selected in step S53 (S54).
[0262] Next, the communication link prediction graph creation system 25 extracts from the unoccurred communication links selected in step S53 that have a confidence level of or higher than the threshold calculated in step S54 (S55).
[0263] Next, the communication link prediction graph creation system 25 checks whether the WL addition setting is set to automatic addition setting (S56).
[0264] In step S56, if the WL addition setting is set to automatic addition (yes in S56), the communication link prediction graph creation system 25 adds the unoccurred communication links extracted in step S55 to the whitelist 22 (S57).
[0265] On the other hand, if the WL addition setting is not set to automatic addition in step S56 (no in S56), the communication link prediction graph creation system 25 performs an NW graph creation process to create an NW graph for the unoccurred communication links extracted in step S55 (S58). The NW graph created here is graph information to assist in the process of adding communication links that are predicted to occur normally in the future to the whitelist 22. The graph information may include, for example, an NW graph showing communication links that have occurred in the past, and information for displaying the unoccurred communication links extracted in step S55 on the NW graph.
[0266] In the following section, we will explain the details of steps S53, S54, and S58 shown in Figure 21 using Figures 22 to 27.
[0267] Figure 22 is a flowchart showing an example of the process for selecting unoccurred communication links shown in Figure 21. Figure 22 shows the process for selecting unoccurred communication links performed by the prediction target link extraction unit 2502 of the communication link prediction graph creation system 25.
[0268] As shown in Figure 22, first, the prediction target link extraction unit 2502 extracts the IP addresses observed during the WL's learning period (S5301). In this embodiment, the prediction target link extraction unit 2502 extracts all IP addresses from the past communication information in the communication information DB 24 as information indicating all observed terminals. Alternatively, all observed IP addresses and observed protocols may be extracted as information indicating all observed terminals.
[0269] Next, the prediction target link extraction unit 2502 acquires the machine learning model learned by the communication link learning system 23 (S5302).
[0270] Next, the prediction target link extraction unit 2502 obtains (selects) one IP address (target terminal) from among all the IP addresses extracted in step S5301 (S5303).
[0271] Next, the prediction target link extraction unit 2502 uses the machine learning model acquired in step S5302 to calculate the similarity between each target terminal with the acquired IP address and all other terminals (S5304). Here, all other terminals are all terminals with IP addresses that have been observed in the past and are included in the past communication information.
[0272] Next, the prediction target link extraction unit 2502 extracts terminals whose similarity calculated in step S5304 is equal to or greater than a threshold as similar terminals (S5305). In this embodiment, the prediction target link extraction unit 2502 calculates the similarity of the target terminal with all terminals observed so far using the machine learning model acquired in step S5302. This machine learning model obtains vector representations of each of the multiple terminals of the monitored target 10 described above.
[0273] Furthermore, the graph analysis technique disclosed in Non-Patent Document 2 can be used as a method for calculating terminal similarity. Specifically, first, a fixed-length embedding vector is calculated as the vector representation of each terminal observed so far using a machine learning model generated by learning past communication information, i.e., the machine learning model obtained in step S5302. Next, the similarity is obtained by calculating the cosine similarity between the target terminal and each of the vector representations of all terminals.
[0274] Next, the prediction target link extraction unit 2502 extracts all communication links of similar terminals that have occurred in the past from the acquired past communication information (S5306). More specifically, the prediction target link extraction unit 2502 extracts communication protocols in which a similar terminal is the source terminal or destination terminal from communication links that have a unique combination of source terminal, destination terminal, and communication protocol (communication triplet) included in the past communication information. For example, the prediction target link extraction unit 2502 uses the machine learning model acquired in step S5302 to extract communication protocols in which a similar terminal is the source terminal or destination terminal from the above unique combination of communication links.
[0275] Next, the prediction target link extraction unit 2502 creates one communication link (combination of source terminal, destination terminal, communication protocol, etc.) that has not yet occurred in the similar terminals acquired in step S5306 (S5307). Specifically, the prediction target link extraction unit 2502 creates one communication link (combination of source terminal, destination terminal, communication protocol, etc.) that is different from the communication links of the similar terminals acquired in step S5306 and has not yet occurred.
[0276] Next, the predicted link extraction unit 2502 determines whether there is a communication similar to the ungenerated communication link created in step S5307 in the past communication information (S5308). Here, the similar communication is, for example, the communication of a communication link that is the same as the destination terminal (or source terminal) and communication protocol included in the ungenerated communication link.
[0277] In step S5308, if there is no similar communication in the past communication information (no in step S5308), the predicted link extraction unit 2502 records the communication link created in step S5307 in the prediction target list as a candidate for an ungenerated communication link (S5309).
[0278] Here, FIG. 23 is a diagram showing an example of a prediction target list including candidates for ungenerated communication links according to the second embodiment. In the example shown in FIG. 23, the predicted link extraction unit 2502 records the communication link created in step S5307 in the form of a communication triplet consisting of a source terminal (Src IP), a destination terminal (Dst IP), and a communication protocol (Protocol).
[0279] On the other hand, in step S5308, if there is a similar communication in the past communication information (yes in step S5308), the predicted link extraction unit 2502 determines whether an ungenerated communication link can be further created in the similar terminal acquired in step S5306 (S5310). That is, the predicted link extraction unit 2502 determines whether an ungenerated communication link can be created in the similar terminal acquired in step S5306 in addition to the ungenerated communication link created in step S5307.
[0280] In step S5310, if there are other past communication links (yes in step S5310), the predicted link extraction unit 2502 returns to step S5307 and selects (acquires) one of the next communication links among the communication links of the similar terminals acquired in step S5306.
[0281] On the other hand, if no other past communication links exist in step S5310 (no in step S5310), the prediction target link extraction unit 2502 determines whether or not other IP addresses exist (S5311). In other words, the prediction target link extraction unit 2502 determines whether or not other IP addresses other than the IP address selected (obtained) in step S5303 exist.
[0282] In step S5311, if other IP addresses exist (yes in step S5311), the prediction target link extraction unit 2502 returns to step S5303 and obtains the next IP address.
[0283] On the other hand, if no other IP addresses exist in step S5311 (no in step S5311), the prediction target link extraction unit 2502 terminates processing.
[0284] Note that the process for selecting unoccurred communication links is not limited to the process described in Figure 22. For example, the prediction target link extraction unit 2502 may first extract a list of all observed IP addresses and protocols from the past communication information in the communication information DB 24. Next, the prediction target link extraction unit 2502 may create all combinations of all observed IP addresses and protocols from the extracted list, as well as extract combinations of IP addresses and protocols that have occurred so far (previously occurring combinations). Then, the prediction target link extraction unit 2502 may create a prediction target list, which is a list of unoccurred communication link candidates, by removing previously occurring combinations from all combinations.
[0285] Figure 24 is a flowchart showing an example of the confidence calculation process shown in Figure 21. Figure 24 shows the confidence calculation process performed by the link confidence calculation unit 2504 of the communication link prediction graph creation system 25.
[0286] As shown in Figure 24, first the link confidence calculation unit 2504 acquires a machine learning model (communication link learning model) (S5401).
[0287] Next, the link confidence calculation unit 2504 uses the machine learning model acquired in step S5401 to calculate the confidence level for each communication link included in the prediction target list (S5402). Here, Figure 25 shows an example of when the calculated confidence levels according to Embodiment 2 are added to the prediction target list. Figure 25 shows an example of when the confidence levels calculated by the link confidence calculation unit 2504 are added to the prediction target list shown in Figure 23. In Figure 25, the confidence level is expressed as a score from 0 to 100, with a score closer to 100 indicating a higher probability of existence.
[0288] Furthermore, in this embodiment, the link confidence calculation unit 2504 calculates the confidence level for each communication link included in the prediction target list using the machine learning model acquired in step S5402. This machine learning model learns past communication information to acquire vector representations for each of the multiple terminals of the monitored target 10, as well as matrix representations related to the relationships between protocols and the like. Therefore, the confidence level for each communication link can be calculated using this machine learning model.
[0289] Furthermore, the graph analysis technique disclosed in Non-Patent Document 2 can be used as the method for calculating the confidence score described above. Specifically, first, a machine learning model that can also be used as a communication link learning model is generated by learning past communication information. In the machine learning model generated in this way, fixed-length embedding vectors (vector representations) calculated for each terminal and matrices (matrix representations) calculated for each protocol are obtained. Then, the confidence score of the target communication link is calculated using the fixed-length embedding vectors calculated for each terminal and the matrices calculated for each protocol. For example, if the vector of the source terminal of the target communication link is x, the vector of the destination terminal is y, the matrix of the target protocol is A, and the confidence score of the communication link to be determined is P, then the confidence score of the communication link can be calculated by performing a calculation such as P = x × A × y using the communication link learning model.
[0290] Figure 26 is a flowchart showing an example of the prediction NW graph creation process shown in Figure 21. Figure 26 shows the NW graph creation process performed by the NW graph creation unit 2506 of the communication link prediction graph creation system 25.
[0291] As shown in Figure 26, first, the NW graph creation unit 2506 obtains past communication information from the communication information DB 24 (S5801).
[0292] Next, the NW graph creation unit 2506 creates graph information from the past communication information acquired in step S5801 (S5802). More specifically, the NW graph creation unit 2506 creates graph information for displaying the past communication links and terminals that are the target for display (within the display range) from the past communication links and terminals included in the past communication information acquired in step S5801 on the NW graph.
[0293] Next, the NW graph creation unit 2506 obtains the prediction target list recorded by the prediction target link extraction unit 2502 (S5803). The prediction target list includes one or more communication links that are candidates for unoccurred communication links created by the prediction target link extraction unit 2502.
[0294] Next, the NW graph creation unit 2506 updates the graph information so that it highlights unoccurred communication link candidates with a confidence level of or higher than a threshold among the unoccurred communication link candidates included in the acquired prediction target list as unoccurred communication links with a high probability of occurring as normal communication in the future (S5804).
[0295] Figure 27 shows an example of unoccurred communication links highlighted on the NW graph in the prediction target list shown in Figure 26. In Figure 27, an example is shown where the threshold score indicating confidence is set to 60, and unoccurred communication links indicated by circles (〇) are highlighted on the NW graph.
[0296] Next, the NW graph creation unit 2506 updates the graph information to display information about unactivated communication links on the NW graph (S5805).
[0297] [2.3 Screen Example] Next, we will explain an example of a screen displaying graph information.
[0298] [2.3.1 Example screen for analyzing unactivated communication links] First, we will explain an example screen displaying graph information for analyzing unoccupied communication links that could potentially occur as normal communication links before the implementation of Whitelist 22.
[0299] Figure 28 shows an example of screen information displayed on the screen when analyzing unoccurred communication links according to Embodiment 2. The screen information shown in Figure 28 is an example of graph information created by the communication link prediction graph creation system 25 and is presented to the analyst by the NW graph display unit 2507.
[0300] In the example screen shown in Figure 28, a graph predicting unactivated communication links and a list (summary) containing unactivated communication links and their confidence levels are displayed. The unactivated communication links and their confidence levels correspond to the information about unactivated communication links mentioned above.
[0301] The prediction graph of unoccurred communication links shown in Figure 28 displays a network graph showing past communication links within the displayed range, created from past communication information, and unoccurred communication links that may occur after operation. In addition, in the example screen shown in Figure 28, information about unoccurred communication links on the network graph is displayed in a pop-up, and unoccurred communication links with a high confidence level are highlighted. In the example shown in Figure 28, information about unoccurred communication links is shown when the confidence score is set to "Low Confidence" if it is between 0 and 60, "Medium Confidence" if it is between 60 and 80, and "High Confidence" if it is 80 or higher.
[0302] Figure 29 shows another example of screen information displayed on the screen when analyzing unactivated communication links according to Embodiment 2. Figure 29 shows an example of the screen displayed when one unactivated communication link is selected in the screen shown in Figure 28.
[0303] In the example screen shown in Figure 29, the selected unoccurred communication link information and similar communication information are displayed as comparison information. The selected unoccurred communication link information is information about the selected unoccurred communication link. The similar communication information is past communication information of similar terminals to the terminal of the selected unoccurred communication link, and is past communication information related to communications (similar communications) that indicate one or more past communication links of the similar terminal used in the unoccurred communication link selection process described above.
[0304] By displaying comparative information in this way, analysts may be able to easily analyze unoccupied communication links that are predicted to occur normally in the future.
[0305] In this embodiment, the analyst can use the comparison information displayed on the screen, as shown in Figure 29, along with other asset information they possess, to analyze the selected unactivated communication links and consider whether to add the selected unactivated communication links to the whitelist 22.
[0306] [2.3.2 Example of the screen after analysis of unactivated communication links] This section describes an example of a screen that appears after analyzing unconnected communication links using the screen information shown in Figures 28 and 29.
[0307] Figure 30 shows an example of screen information displayed on the screen when showing the prediction results for unoccurred communication links according to Embodiment 2. The screen information shown in Figure 30 is an example of graph information created by the communication link prediction graph creation system 25.
[0308] Figure 30 shows an example where, for each uncommunicated communication link shown in the list of uncommunicated communication links, a checkbox is displayed allowing the user to select whether or not to add it to the whitelist 22. Figure 30 also shows an example where a button is displayed to add the uncommunicated communication link selected by the checkbox to the whitelist 22; this is the "Add Checked Link WL" button.
[0309] After the analysis, the analyst simply selects the checkboxes for the unactivated communication links that should be added to the whitelist 22 from the list of unactivated communication links shown in Figure 30, and presses the Add Checked Link WL button. The WL change operation unit 2508 then adds the unactivated communication links to the whitelist 22.
[0310] In this way, analysts can easily add unoccupied communication links that are expected to occur normally in the future to the whitelist 22.
[0311] [2.3.3 Example screen for adjusting the display range of the forecast graph] Figure 31 shows an example of screen information displayed on the screen when adjusting the display range of the graph of unoccurred communication links according to Embodiment 2. The screen information shown in Figure 31 is an example of graph information created by the communication link prediction graph creation system 25. The screen information shown in Figure 31 also shows a slider bar for adjusting the confidence threshold of unoccurred communication links by the NW graph display unit 2507. The unoccurred communication links and related information shown on the NW graph are reconfigured and displayed on the screen according to the threshold changed by the analyst's operation on the slider bar. By default, the position of the slider bar is adjusted so that the number of unoccurred communication links is only a few.
[0312] As shown in the example in Figure 31, the display range of unactivated communication links and information related to unactivated communication links can be adjusted by manipulating the slider bar, which may facilitate the analysis of unactivated communication links that are expected to occur normally in the future. Therefore, it may be easier to decide whether to add unactivated communication links that are expected to occur normally in the future to the whitelist 22.
[0313] [2.4 Effects, etc.] If operation begins with a whitelist 22 that has not been sufficiently trained, there is a possibility that a large number of normal communication links will be mistakenly identified as non-WL communication links. Furthermore, manually adding information on normal communication links from equipment information, etc., to a whitelist 22 that has not been sufficiently trained before operation begins is difficult because there are countless communication links that did not occur during the training period.
[0314] In contrast, according to the communication analysis system of Embodiment 2 described above, before operating the whitelist 22, it is possible to analyze unoccurred communication links that were not occurring during the learning period but are highly likely to be normal communication links and add them to the whitelist 22.
[0315] Specifically, the communication analysis system 20A extracts unoccurred communication links that are highly likely to be normal communication links that did not occur during the learning period and were not learned (i.e., are not included in the whitelist 22) before operation. It then creates information that can be used to determine whether or not to add them to the whitelist 22, and presents this information to the analyst as graph information. This makes it possible to add unoccurred communication links that did not occur during the learning period but are highly likely to be normal communication links to the whitelist 22 before operation. As a result, false positives that occur after operation can be suppressed, and the whitelist 22 can be operated efficiently, so that the analyst can easily verify the legitimacy of WL-external communications that occur during operation.
[0316] (Possibility of other embodiments) The above describes a communication analysis system, analysis method, and program according to one aspect of the present disclosure based on embodiments, but the present disclosure is not limited to these embodiments. Within the scope of the present disclosure, various modifications to these embodiments that a person skilled in the art could conceive, or configurations constructed by combining components from different embodiments, are also included, as long as they do not depart from the spirit of the present disclosure. For example, the following cases are also included in the present disclosure.
[0317] (1) In the above embodiment, the case in which a machine learning model is used to extract similar devices was described as an example, but similar devices may also be extracted (determined) according to predetermined rules.
[0318] In this case, by taking the basic idea that terminal similarity equals similarity of roles, similar terminals can be extracted by using the rules that categorize terminals with the same role as the predetermined rules mentioned above. Here, similarity of roles can be determined using elements such as the protocol used, server or client, and segment information.
[0319] Furthermore, if separate completion documents, i.e., delivery materials from the vendor during equipment construction, exist, the above factors may be omitted, and similar terminals may be determined based on the role of the terminals identified in the completion documents.
[0320] Figure 32 shows an example of how completion documents are described in a building. As shown in Figure 32, each terminal, indicated by its IP address, has a listed equipment name, and the role of the terminal can be determined by the information listed in the equipment name.
[0321] Figure 33 is a flowchart showing the process for determining similar terminals using predetermined rules. Steps S61, S64, S66, and S67 are performed to categorize the role of the terminals based on the protocol used. Steps S62 and S63 are performed to categorize whether the terminal is a server or a client based on how the control protocol is used. Step S65 is performed to categorize whether the terminal is a security device or a network device based on the access method, etc.
[0322] Note that the prescribed rules shown in Figure 33 are just one example and are not limited to them.
[0323] For example, if there are completed documents, the content described in the completed documents can be supplemented with the prescribed rules. Also, even terminals categorized in the same category may have different roles on different manufacturing lines. For this reason, even terminals belonging to the same category may have different roles if they belong to different segments.
[0324] (2) In the above embodiment 2, the case in which unoccurred communication links are extracted using a machine learning model was described as an example, but unoccurred communication links may also be extracted (determined) using predetermined rules.
[0325] In this case, the results of categorizing the terminals according to the prescribed rules in (1) above can be used.
[0326] Furthermore, for example, if a specific communication is being made from X or more terminals categorized as Role A to a terminal categorized as Role B (which is different from Role A), then there is a high probability that the terminals of Role A will also make that specific communication to the terminals of Role B. In other words, there is a high probability that a communication link between the terminals of Role A and the terminals of Role B will be established through that specific communication. Therefore, by using the value of X, the confidence level of an unestablished communication link can be determined. Here, confidence level can also be rephrased as the degree (strength) of the possibility that the unestablished communication link will be established in the future.
[0327] Alternatively, the level of confidence may be determined by empirically setting a series of thresholds.
[0328] Alternatively, the confidence score can be adjusted by assigning weights to each group, i.e., to each of the multiple terminals categorized under the same role. For example, if the role is "Other IT, etc.," the number of terminals categorized under the same group will be large, so the weight should be reduced. By assigning such weights, a predetermined rule can be created to extract only the unoccurred communication links with high confidence scores.
[0329] Then, the user should be presented with any communication links that have not yet occurred and have a confidence level above the final threshold, as they can be added to the whitelist 22.
[0330] For example, let X be the number of configuration change communications that occurred between terminal pairs from group A to group B, with group A having a weight of 0.8 and group B having a weight of 0.9. In this case, the Score indicating the confidence that configuration change communications will occur on communication links where they have not yet occurred can be calculated as follows.
[0331] Score = X * (0.8) * (0.9)
[0332] Furthermore, when the threshold is set to θ, communication links that satisfy θ ≤ Score can be extracted as communication links between terminal pairs where no configuration value change communication has occurred between Group A and Group B (unoccurring communication links).
[0333] (3) Some or all of the components constituting the above communication analysis system may be a computer system consisting of a microprocessor, ROM, RAM, hard disk unit, display unit, keyboard, mouse, etc. A computer program is stored in the RAM or hard disk unit. The microprocessor operates in accordance with the computer program, thereby enabling each device to perform its function. Here, the computer program is composed of a combination of multiple instruction codes that indicate commands to the computer in order to achieve a predetermined function.
[0334] (4) Some or all of the components constituting the above communication analysis system may be made up of a single system LSI (Large Scale Integration). The system LSI is a multi-functional LSI manufactured by integrating multiple components onto a single chip, and specifically, it is a computer system that includes a microprocessor, ROM, RAM, etc. A computer program is stored in the RAM. The system LSI achieves its function by operating the microprocessor in accordance with the computer program.
[0335] (5) Some or all of the components constituting the above communication analysis system may consist of detachable IC cards or standalone modules attached to each device. The IC card or module is a computer system consisting of a microprocessor, ROM, RAM, etc. The IC card or module may include the above-mentioned multi-functional LSI. The microprocessor operates according to a computer program, thereby enabling the IC card or module to perform its function. The IC card or module may be tamper-resistant. [Industrial applicability]
[0336] This disclosure can be used in communication analysis systems, analysis methods, and programs, and in particular in communication analysis systems, analysis methods, and programs for easily analyzing the abnormality of communication links outside the whitelist. [Explanation of symbols]
[0337] 10. Items under surveillance 20, 20A Communication Analysis System 21. Analysis support graph creation system 22 Whitelist 23 Communication Link Learning System 24 Communication Information Database 25 Communication Link Prediction Graph Generation System 30 Networks 2101 Information Receiving Unit 2102 WL judgment section 2103 Similar Terminal Extraction Unit 2104, 2503 Model Acquisition Section 2105, 2301, 2501 Information acquisition section 2106 Secondary Similar Communication Link Extraction Unit 2107 Primary Similar Communication Link Extraction Unit 2108, 2508 WL change operation unit 2109, 2506 NW Graph Creation Department 2110, 2507 NW graph display section 2111 Extraction condition storage section 2112 Extraction Condition Setting Unit 2302 Model Learning Department 2303 Model setting memory unit 2304 Model Memory Unit 2305 Model Setting Section 2502 Prediction Target Link Extraction Unit 2504 Link confidence calculation unit 2505 WL automatic addition judgment section
Claims
1. A communication analysis system that analyzes network communications of multiple terminals in a predetermined environment that is subject to monitoring, The aforementioned communication analysis system is A whitelist created by learning the communications that took place on the monitored target, A communication information DB that holds past communication information consisting of information indicating past communications that took place at the monitored target, The system includes an analysis support graph creation system that creates graph information for analyzing unlisted communications, which are communications not included in the aforementioned whitelist, The aforementioned analysis support graph creation system is: An information receiving unit that receives information indicating communications taking place at the monitored object and that are to be analyzed, An information acquisition unit that acquires the past communication information from the aforementioned communication information DB, A whitelist determination unit determines whether a communication outside the whitelist has occurred in the communication to be analyzed, using the information indicating the communication to be analyzed acquired by the information receiving unit and the whitelist. Similar terminal extraction unit extracts one or more similar terminals of the destination terminal and one or more similar terminals of the source terminal in the non-whitelist communication link, which is a communication link of the non-whitelist communication determined by the whitelist determination unit, as first similar terminals, A primary similar communication link extraction unit extracts, from the past communication information acquired by the information acquisition unit, past communication links that are similar to the whitelisted communication links and whose destination terminals are the first similar terminal and the second similar terminal, or whose source terminals are the first similar terminal and the second similar terminal, using the first similar terminal and the second similar terminal extracted by the similar terminal extraction unit, as primary similar communication links. The system includes an NW graph creation unit that uses the primary similar communication links extracted by the primary similar communication link extraction unit and the past communication information acquired by the information acquisition unit to create an analytical NW graph as graph information for analyzing the communication outside the whitelist. Communication analysis system.
2. The aforementioned analysis support graph generation system further: It has a secondary similar communication link extraction unit that extracts secondary similar communication links that are different from the primary similar communication links and the whitelisted communication links, The aforementioned similar terminal extraction unit further, A third similar terminal is extracted which is one or more similar terminals of a different recipient terminal from the source terminal in a communication link outside the whitelist, and a fourth similar terminal is extracted which is one or more similar terminals of a different recipient terminal from the source terminal in a communication link. The aforementioned secondary similar communication link extraction unit is: Using the first similar terminal, second similar terminal, third similar terminal, and fourth similar terminal extracted by the similar terminal extraction unit and the past communication information acquired by the information acquisition unit, the primary similar communication link extraction unit extracts a communication link that is different from the primary similar communication link and the non-whitelisted communication link acquired by the primary similar communication link extraction unit, and that is similar to a past communication made by the source terminal or destination terminal in the non-whitelisted communication link, as the secondary similar communication link. The aforementioned NGraph creation unit, Using the extracted primary similar communication links and secondary similar communication links, and the past communication information acquired by the information acquisition unit, the network graph for analysis is created. The communication analysis system according to claim 1.
3. The NW graph creation unit creates the NW graph and also creates messages based on the created NW graph as auxiliary information for the user to perform the analysis. The communication analysis system according to claim 2.
4. The aforementioned analysis support graph generation system further: An NW graph display unit that displays the NW graph created by the NW graph creation unit on the screen, It includes a WL change operation unit that adds the communication outside the whitelist to the whitelist in response to user instructions, The communication analysis system according to claim 2.
5. The aforementioned NW graph display unit is A slider bar for adjusting the similarity threshold between the destination terminal and the source terminal and the similar terminal in the aforementioned non-whitelisted communication link is displayed on the screen. In accordance with the threshold changed by the user's operation on the slider bar, the number of similar terminals shown in the NW graph is reconstructed and displayed on the screen. The communication analysis system according to claim 4.
6. The aforementioned NW graph display unit is In the NW graph displayed on the aforementioned screen, if two communication links from either the whitelisted communication link or the primary similar communication link, or two communication links from the secondary similar communication link, are selected by the user, Detailed information of the two selected communication links is displayed on the screen as comparative information for analyzing the communication outside the whitelist. The communication analysis system according to claim 4.
7. The aforementioned NW graph display unit is A message is displayed regarding the procedure for selecting the two communication links, and the user is guided so that the comparison information is displayed on the screen. The communication analysis system according to claim 6.
8. The aforementioned NW graph display unit is If the user selects a communication link outside the whitelist, a button is displayed to determine whether to add the selected communication link to the whitelist. The WL change operation unit is, Depending on the user's input to the aforementioned button, the non-whitelisted communication is added to the whitelist. The communication analysis system according to claim 4.
9. The aforementioned NGraph creation unit, In the created network graph, the network graph display unit highlights the non-whitelisted communication links so that the user can distinguish them from other communication links. The communication analysis system according to claim 4.
10. The aforementioned NGraph creation unit, In the created NW graph, in order for the user to identify similar terminals belonging to the same group and different groups from each other, the destination terminal and the first similar terminal in the non-whitelisted communication link are grouped together, the source terminal and the second similar terminal in the non-whitelisted communication link are grouped together, the third similar terminal is grouped together, and the fourth similar terminal is grouped together and displayed on the NW graph display unit. The communication analysis system according to claim 4.
11. The similar device extraction unit extracts similar devices by calculating the similarity between the multiple devices using a trained machine learning model. A communication analysis system according to any one of claims 1 to 10.
12. The machine learning model is This is generated by learning the communications performed on the monitored target using a link prediction or node classification algorithm that can create a fixed-dimensional vector for each terminal that appeared in the communication, and a fixed-size matrix for each communication type that appeared in the communication. The communication analysis system according to claim 11.
13. The aforementioned machine learning model, It consists of one of the following: LinkFeat, COMPGCN (COMPosition-based multi-relational Graph Convolutional Networks), R-GCN (Relational Graph Convolutional Network), DistMult, TransE (Translating Embeddings for Modeling Multi-relational Data), HolE (Holographic Embeddings of Knowledge Graphs), or ComplEx (Complex Embeddings for Simple Link Prediction). The communication analysis system according to claim 12.
14. An analysis method in which a computer analyzes the network communications of multiple terminals in a predetermined environment that is under monitoring, An information receiving step of receiving information that indicates a communication taking place at the monitored object and that is the communication to be analyzed, An information acquisition step of acquiring past communication information from a communication information DB that holds past communication information consisting of information indicating past communications that took place at the monitored target, A whitelist determination step is performed to determine whether an out-of-whitelist communication, which is a communication not included in the whitelist, has occurred in the communication of the target of analysis, using the information indicating the communication to be analyzed obtained in the information reception step and the whitelist created by learning the communications performed on the monitored target. Similar terminal extraction step extracts one or more similar terminals of the destination terminal and one or more similar terminals of the source terminal in the non-whitelisted communication link, which is the communication link of the non-whitelisted communication determined in the whitelist determination step, as first similar terminals, A primary similar communication link extraction step is performed using the first similar terminal and the second similar terminal extracted in the similar terminal extraction step, and extracting from the past communication information acquired in the information acquisition step past communication links that are similar to the whitelisted communication links and whose destination terminals are the first similar terminal and the second similar terminal, or whose destination terminals are the source terminals, as primary similar communication links. The method includes a network graph creation step which uses the primary similar communication links extracted in the primary similar communication link extraction step and the past communication information acquired in the information acquisition step to create an analysis network graph as graph information for analyzing the non-whitelisted communications. Analysis method.
15. A program for causing a computer to execute an analysis method that analyzes network communications of multiple terminals in a predetermined environment that is subject to monitoring, An information receiving step of receiving information that indicates a communication taking place at the monitored object and that is the communication to be analyzed, An information acquisition step of acquiring past communication information from a communication information DB that holds past communication information consisting of information indicating past communications that took place at the monitored target, A whitelist determination step is performed to determine whether an out-of-whitelist communication, which is a communication not included in the whitelist, has occurred in the communication of the target of analysis, using the information indicating the communication to be analyzed obtained in the information reception step and the whitelist created by learning the communications performed on the monitored target. Similar terminal extraction step extracts one or more similar terminals of the destination terminal and one or more similar terminals of the source terminal in the non-whitelisted communication link, which is the communication link of the non-whitelisted communication determined in the whitelist determination step, as first similar terminals, A primary similar communication link extraction step is performed using the first similar terminal and the second similar terminal extracted in the similar terminal extraction step, and extracting from the past communication information acquired in the information acquisition step past communication links that are similar to the whitelisted communication links and whose destination terminals are the first similar terminal and the second similar terminal, or whose destination terminals are the source terminals, as primary similar communication links. A network graph creation step is performed to create an analysis network graph as graph information for analyzing the whitelisted communications, using the primary similar communication links extracted in the primary similar communication link extraction step and the past communication information acquired in the information acquisition step. A program that is executed by a computer.