Risk assessment method and storage medium
By acquiring host security data and using knowledge graphs and large language models for deep fusion analysis, the problem of insufficient risk assessment in existing technologies has been solved, and more accurate and reliable risk assessment has been achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING VENUS INFORMATION SECURITY TECH
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-19
Smart Images

Figure CN122247689A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of cybersecurity technology, and more particularly to a risk assessment method and storage medium. Background Technology
[0002] Risk assessment of network assets is a core aspect of information security management. It requires not only focusing on technical vulnerabilities but also comprehensively considering the asset's value, threat level, and business impact. Taking hosts as an example, current risk assessment methods typically evaluate them from two dimensions: host vulnerability characteristic data and asset value. A random forest model optimized using genetic algorithms is used to predict the probability of vulnerability exploitation. Then, the CVSS (Common Vulnerability Scoring System) score is adjusted, and finally, the asset value is combined to assess the host's risk. However, limited by factors such as data dimensionality and model reasoning capabilities, the accuracy of risk assessment in these technologies is relatively low, especially for complex, multi-step attack chains or attacks utilizing unconventional vulnerability combinations, making effective identification and risk extrapolation difficult. Summary of the Invention
[0003] This disclosure provides a risk assessment method and a storage medium.
[0004] In a first aspect, embodiments of this disclosure provide a risk assessment method, comprising: acquiring security data related to a host to be assessed; processing the security data using a knowledge graph to generate a risk assessment context; and using a pre-trained large language model to perform reasoning based on the risk assessment context to output a risk assessment result for the host to be assessed.
[0005] Secondly, this disclosure provides a non-transient computer storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the risk assessment method described in the above embodiments.
[0006] The risk assessment method of this disclosure utilizes the powerful semantic understanding, information extraction, and logical reasoning capabilities of knowledge graphs and large language models to deeply integrate and analyze security data related to the host to be assessed. This can generate more accurate, comprehensive, and interpretable risk assessment results, thereby improving the reliability of risk assessment.
[0007] Other features and advantages of this disclosure will be set forth in the following description, and will be apparent in part from the description, or may be learned by practicing the disclosure. Other advantages of this disclosure may be realized and obtained by means of the methods described in the description and the accompanying drawings. Attached Figure Description
[0008] The accompanying drawings are used to provide an understanding of the technical solutions disclosed herein and form part of the specification. They are used together with the embodiments of the present disclosure to explain the technical solutions of the present disclosure and do not constitute a limitation on the technical solutions of the present disclosure.
[0009] Figure 1 This is a flowchart illustrating one embodiment of the risk assessment method disclosed herein; Figure 2 This is a schematic diagram of the process for obtaining security data in one embodiment of the risk assessment method disclosed herein; Figure 3 This is a schematic diagram illustrating the process of using a knowledge graph in one embodiment of the risk assessment method disclosed herein. Figure 4 This is a schematic diagram of the process for generating a security risk knowledge graph in one embodiment of the risk assessment method disclosed herein; Figure 5 This is a schematic diagram of the process for calculating the risk score of a path in one embodiment of the risk assessment method of this disclosure; Figure 6 This is a schematic diagram of the process for calculating topological risk information in one embodiment of the risk assessment method disclosed herein; Figure 7 This is a schematic diagram of the process for generating a risk assessment context in one embodiment of the risk assessment method disclosed herein. Detailed Implementation
[0010] To make the objectives, technical solutions, and advantages of this disclosure clearer, the embodiments of this disclosure will be described in detail below with reference to the accompanying drawings. It should be noted that, unless otherwise specified, the embodiments and features described in this disclosure can be arbitrarily combined with each other.
[0011] The embodiments disclosed herein are not necessarily limited to the dimensions shown in the drawings, and the shapes and sizes of the components in the drawings do not reflect actual proportions. Furthermore, the drawings schematically illustrate ideal examples, and the embodiments of this disclosure are not limited to the shapes or values shown in the drawings.
[0012] The ordinal numbers such as "first" and "second" in this disclosure are used to avoid confusion among the constituent elements and do not indicate any order, quantity, or importance.
[0013] In this disclosure, unless otherwise expressly specified and limited, the terms "installation," "connection," and "linkage" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; a mechanical connection or an electrical connection; a direct connection, an indirect connection via an intermediate component, or a connection within two components. Those skilled in the art can understand the specific meaning of these terms in this disclosure according to the specific circumstances.
[0014] In related technologies, the random forest model used to assess host risk is essentially a static model trained on historical data. It lacks the ability to understand and reason about the deep logic and context of security events using natural language. Furthermore, the construction and optimization of the model (such as genetic algorithm optimization) heavily rely on the prior knowledge of security experts, making it difficult to adaptively learn and understand constantly evolving new attack methods. All of these factors limit the comprehensiveness and reliability of risk assessment. In particular, when faced with complex, multi-step attack chains or attacks that utilize unconventional combinations of vulnerabilities, the model struggles to effectively identify and extrapolate risks.
[0015] To address the aforementioned problems, this disclosure provides a risk assessment method, such as... Figure 1 As shown, the method may include the following steps.
[0016] Step 110: Obtain security data related to the host to be evaluated.
[0017] As an example, security data related to the host to be evaluated may include vulnerability information of the host itself, such as the name, description, type, and CVSS vector of the vulnerability, as well as asset information of the host, such as IP address, hostname, department, and business importance level.
[0018] Step 120: Use knowledge graphs to process security data and generate risk assessment context.
[0019] In this embodiment, a knowledge graph can be constructed based on the security data obtained in step 110, and then deep reasoning can be performed based on the knowledge graph to generate a risk assessment context.
[0020] As an example, risk ontology modeling can be performed based on information such as hosts, assets, and vulnerabilities in security data, and nodes used to represent entities in the knowledge graph can be determined. Then, based on the risk relationships between entities, edges connecting the corresponding nodes of the entities can be determined, thereby mapping the security data into a knowledge graph. Then, reasoning can be performed using the knowledge graph to uncover more comprehensive and complex risk information and output risk assessment context.
[0021] Step 130: Use a pre-trained large language model to perform reasoning based on the risk assessment context and output the risk assessment results of the host to be assessed.
[0022] In this embodiment, the large language model possesses context-sensitive reasoning capabilities and natural language generation capabilities. It simulates the thinking of security experts to perform reasoning, enabling more flexible and interpretable risk assessments and outputting risk assessment results. Here, the risk assessment results can include quantified risk levels and reasoning processes to improve the interpretability of the risk assessment results. For example, risk levels can include: extremely dangerous, highly dangerous, moderately dangerous, low dangerous, relatively safe, and very safe. Extremely dangerous indicates that the host being assessed has a directly exploitable high-risk attack path; highly dangerous indicates that the host being assessed has a high probability of being attacked; moderately dangerous indicates that the host being assessed has a moderate level of security risk; low dangerous indicates that the host being assessed has low-risk issues; relatively safe indicates that the host being assessed is basically safe and meets the security baseline; and very safe indicates that the host being assessed has multiple layers of protection and the risk is extremely low.
[0023] In an optional implementation of this embodiment, step 130 may include: generating multi-dimensional assessment prompts based on the risk assessment context; inputting the assessment prompts and the risk assessment context into a large speech model to obtain the risk assessment result.
[0024] As an example, assessment prompts may include the following dimensions: vulnerability risk dimension, such as comprehensively considering CVSS scores, EPSS (Exploit Prediction Scoring System) probabilities, and exploitability; configuration security dimension, such as assessing configuration compliance and security baseline conformity; network topology dimension, such as the location and connectivity risks of assets within the network; business impact dimension, such as assessing potential impact in conjunction with the business importance of assets; and time dynamic dimension, such as the timeliness of vulnerabilities and their correlation with events.
[0025] In this embodiment, the evaluation prompts can guide the large language model to make comprehensive inferences according to the dimensions of the evaluation prompts, which can make more comprehensive inferences based on the risk assessment context, and help to further improve the comprehensiveness and reliability of risk assessment.
[0026] The risk assessment method of this disclosure utilizes the powerful semantic understanding, information extraction, and logical reasoning capabilities of knowledge graphs and large language models to deeply integrate and analyze security data related to the host to be assessed. This can generate more accurate, comprehensive, and interpretable risk assessment results, thereby improving the reliability of risk assessment.
[0027] In related technologies, the risk assessment of a host mainly relies on vulnerability data and asset value data. However, this data has limited coverage of risk behaviors and information, such as failing to cover host process behavior and security configuration data, thus limiting the comprehensiveness of the risk assessment.
[0028] To address this issue, step 110 above can be achieved through... Figure 2 The process shown acquires security data related to the host to be evaluated, such as... Figure 2 As shown, the process may include the following steps.
[0029] Step 210: Obtain external data related to the host to be evaluated.
[0030] The external data includes at least one of the following: vulnerability signature data, vulnerability exploitability information, vulnerability remediation priority, associated attack techniques, security configuration baseline verification items and remediation plans, weak password patterns and password policy violation characteristics, and threat intelligence metadata.
[0031] As an example, external data corresponding to the host to be evaluated can be obtained from the network through API or crawling algorithms.
[0032] Step 220: Monitor the security status of the network environment in which the host to be evaluated is located, and obtain internal data related to the host to be evaluated.
[0033] The internal data includes at least one of the following: asset attribute information, vulnerability scan results, network connection topology, security event logs, security configuration compliance, and weak password detection.
[0034] As an example, the network system where the host being evaluated is located can be monitored via API to check the status of various devices in the network, such as scanners, SIEM (Security Information and Event Management), CMDB (Configuration Management Database), and EDR (Endpoint Detection and Response), thereby obtaining internal data related to the host being evaluated.
[0035] Step 230: Merge external data with internal data to obtain secure data.
[0036] In this embodiment, both external and internal data are multi-source heterogeneous data, and data fusion can be achieved in the following way: First, the key fields in various types of data are parsed to remove duplicate data; then, the data from different sources are converted into a unified format, such as JSON Schema; after that, key fields can be extracted from all the data, such as vulnerability ID, network address, hostname, etc., to extract entities from the data, and the same field from different sources is unified into a standard field, thereby obtaining the fused security data.
[0037] In this embodiment, generating security data for the host to be evaluated based on multi-source heterogeneous data can significantly increase the coverage of security data on risky behaviors and information, provide more comprehensive data support for the subsequent risk assessment process, and help improve the comprehensiveness of risk assessment.
[0038] In some embodiments, step 120 described above can be achieved through... Figure 3 The process shown generates a risk assessment context, such as Figure 3 As shown, the process may include the following steps.
[0039] Step 310: Construct a security risk knowledge graph based on security data.
[0040] In this embodiment, entity modeling and risk relationship modeling can be performed based on security data. The entities contained in the security data are mapped to nodes in the security risk knowledge graph, and the risk relationships contained in the security data are mapped to edges in the security risk knowledge graph. Then, the nodes are connected according to the risk relationships to obtain the security risk knowledge graph corresponding to the security data.
[0041] Step 320: Extract multi-dimensional risk information from the security risk knowledge graph and aggregate it into a structured risk assessment context.
[0042] As an example, various algorithms such as semantic search and graph theory can be used to extract risk information from the security risk knowledge graph in multiple dimensions, and then perform aggregation processing based on pre-determined aggregation rules to transform the aggregation results into a structured risk assessment context.
[0043] In this embodiment, discrete security data can be mapped into a security risk knowledge graph, and then deep-level risk semantics and structural features can be extracted from it and aggregated into a structured risk assessment context, which can provide high-quality, multi-dimensional risk cognition input for subsequent large language models.
[0044] In some optional embodiments of this example, step 310 can be performed by... Figure 4 The process shown constructs a security risk knowledge graph, such as Figure 4As shown, the process may include the following steps.
[0045] Step 410: Extract the predetermined ontology information from the security data and determine the nodes that represent the ontology.
[0046] The node types include at least one of the following: asset node, vulnerability node, product node, configuration node, weak password node, security event node, and network connection node.
[0047] In this implementation, various types of risk ontology can be constructed based on knowledge in the cybersecurity domain, and entities can be represented using nodes in a security risk knowledge graph. As examples, the correspondence between nodes and risk ontology can include the following: Asset nodes, such as representing network entities carrying business value like hosts and devices; Vulnerability nodes, such as representing security flaws like software vulnerabilities and configuration vulnerabilities; Product nodes, such as representing product entities like software and hardware products, whose attributes can include vendor, version, and functionality; Configuration nodes, such as representing security configuration status entities like security baselines and compliance requirements; Weak Password nodes, such as representing authentication risk entities like services, users, and password strength; Security Event nodes, such as representing risk entities like security alerts and detection events; and Network Connection nodes, such as representing risk entities like network communication relationships and traffic characteristics.
[0048] Step 420: Extract the predetermined risk relationships from the security data and determine the edges that represent the risk relationships.
[0049] The risk relationships include at least one of the following: vulnerabilities affect products or configurations, assets have vulnerabilities, assets have insecure configurations, assets have weak password risks, security events are triggered by specific vulnerabilities or attacks, network connections between assets, and products run on assets.
[0050] In this embodiment, the edges in the security risk knowledge graph can represent the logical relationships in network security. Various types of risk relationships can be constructed based on attack logic and security semantics in the network security domain, and represented using the edges in the security risk knowledge graph.
[0051] Step 430: Generate a security risk knowledge graph based on the determined nodes and edges.
[0052] As an example, the relationships between entities represented by nodes can be connected using corresponding edges to form a security risk knowledge graph corresponding to security data.
[0053] In this embodiment, corresponding information can be extracted from security data based on pre-determined risk entities and risk relationships, and a security risk knowledge graph can be generated. This can transform discrete security data into a structured knowledge network, which can more intuitively reflect information such as asset exposure surface, attack paths, and risk correlations in the network.
[0054] In some optional implementations of this embodiment, step 320, which extracts multi-dimensional risk information from the security risk knowledge graph, may include at least one of the following: based on a predetermined attack semantic path, extracting attack semantic paths related to the host to be evaluated from the security risk knowledge graph, and determining the risk score of each attack semantic path to obtain path risk information; based on a graph theory algorithm, extracting the structural information of nodes from the security risk knowledge graph, and determining topological risk information based on the structural information.
[0055] In this implementation, multiple types of attack semantic paths can be predefined based on attack chains and security analysis experience. Each attack semantic path can represent a type of risk propagation chain. By calculating the risk score of the attack semantic path, risk information represented by semantic features in the security risk knowledge graph of the host to be evaluated can be obtained. By calculating topological risk information, risk information represented by structural features of the host to be evaluated in the security risk knowledge graph can be obtained. In this way, multi-dimensional risk information can be extracted from the security risk knowledge graph, providing more comprehensive data support for subsequent risk assessment.
[0056] In a specific example, the attack semantic path includes at least one of the following: An attack exploitation path is formed by connecting vulnerability nodes, product nodes, and asset nodes sequentially through corresponding edges. For example, it can be represented as: vulnerability node - [:affects] -> product node - [:runs_on] -> asset node.
[0057] A lateral movement path is formed by connecting different asset nodes and weak password nodes through corresponding edges. Different asset nodes are interconnected and point to weak password nodes. For example, it can be represented as: asset node - [:connects_to] -> asset node - [:has_weak_password] -> weak password.
[0058] An event tracing path is formed by connecting security event nodes, vulnerability nodes, and asset nodes through corresponding edges. Both security event nodes and asset nodes point to vulnerability nodes. For example, it can be represented as: security event node - [:triggered_by -> vulnerability node <- [:has_vulnerability] - asset node.
[0059] The configuration risk path is formed by connecting asset nodes, configuration nodes, and vulnerability nodes through corresponding edges, where both asset nodes and vulnerability nodes point to configuration nodes. For example, it can be represented as: Asset node - [:has_insecure_configuration] -> Configuration node <- [:affects] - Vulnerability node.
[0060] As an example, based on predefined attack semantic paths of various types, a breadth-first search algorithm can be used to identify these paths in a security risk knowledge graph and output a path list. Each attack semantic path in the list can include a node sequence and the type of edge. Then, according to preset risk calculation rules, a risk score is determined for each attack semantic path, yielding path risk information.
[0061] In one example of this implementation, it can be achieved through Figure 5 The process shown determines the risk score for each attack semantic path, such as Figure 5 As shown, the process may include the following steps.
[0062] Step 510: Based on the pre-determined risk contribution values of nodes, determine the risk contribution values of all nodes included in the attack semantic path.
[0063] As an example, risk contribution values can be assigned to different types of nodes based on their attributes, thus quantifying the risk level of the nodes. Here, the risk contribution value can be set from 0 to 10.
[0064] The risk contribution value of an asset node can be determined based on its value. The higher the asset value, the higher the risk contribution value. For example, assets can be divided into 5 levels: core level, risk contribution value of 10 points; key level, risk contribution value of 7 points; important level, risk contribution value of 5 points; general level, risk contribution value of 2 points; and test level, risk contribution value of 0 points.
[0065] The risk contribution value of a vulnerability node can be determined based on the severity level of the vulnerability; the higher the severity level, the higher the risk contribution value. Alternatively, the risk contribution value of a vulnerability node can be determined based on other attributes of the vulnerability (such as the vulnerability's name, description, type, and CVSS vector).
[0066] The risk contribution value of a product node can be determined based on its usage frequency; the higher the usage frequency, the higher the risk contribution value. Alternatively, the risk contribution value of a product node can be determined based on its market penetration or brand awareness; the higher the market penetration or brand awareness, the greater the risk contribution value.
[0067] The risk contribution value of a configuration node can be determined based on the importance of non-compliant configuration items; the higher the importance, the higher the risk contribution value.
[0068] The risk contribution value of a weak password node can be determined based on the product or entity that the weak password is applied to. For example, more important entities or products such as databases or operating systems have a higher risk contribution value, while other products or entities have a lower risk contribution value.
[0069] The risk contribution value of a security event node can be determined based on its scope and degree of impact. The larger the scope and degree of impact, the greater the risk contribution value of the security event node.
[0070] The risk contribution value of a network connection node can be determined based on the type of network connection. For example, the risk contribution value of a network connection node on the direct attack surface can be the highest, 10 points; the risk contribution value of a network connection node moving laterally inside the network can be a relatively high 7 points; the risk contribution value of a network connection node in a data breach channel can be a relatively low 5 points; and the risk contribution value of a network connection node in a dependency chain and supply chain can be even lower, 2 points.
[0071] Step 520: Determine the attenuation coefficient of the attack semantic path based on the pre-determined correspondence between path length and attenuation coefficient.
[0072] The attenuation coefficient is negatively correlated with the path length.
[0073] Generally, the longer the attack path, the higher the cost for the attacker, and therefore the lower the risk. Here, an exponential function can be used to characterize the relationship between path length and attenuation coefficient, for example, it can be expressed as attenuation coefficient = 0.7. (L-1) In the formula, L represents the number of edges contained in the attack path.
[0074] Step 530: Based on the pre-determined relation weights, determine the weight coefficient of each node contained in the attack semantic path.
[0075] The weight coefficient of non-terminal nodes is the weight coefficient corresponding to the risk relationship represented by the edge introduced by the node, and the weight coefficient of terminal nodes (nodes located at the end of the attack semantic path) is 1.
[0076] In this embodiment, the relation weight can characterize the importance of the risk relation. The more important the risk relation represented by the edge, the larger the weight coefficient of the edge.
[0077] As an example, the importance of a vulnerability affecting a product or configuration is high, and its relationship weight could be 0.8. The importance of an asset having a vulnerability is the highest, and its relationship weight could be the highest, 1. The importance of an asset having an insecure configuration is high, and its relationship weight could be 0.7. The importance of an asset having a weak password risk is high, and its relationship weight could be 0.9. The importance of a security event triggered by a specific vulnerability or attack is moderate, and its relationship weight could be 0.6. The importance of network connectivity between assets is moderate, and its relationship weight could be 0.5. The importance of a product running on an asset is low, and its relationship weight could be 0.3.
[0078] Step 540: Determine the weighted sum of the risk contribution value and weight coefficient of all nodes included in the attack semantic path.
[0079] Step 550: Determine the product of the weighted sum and the attenuation coefficient of the attack semantic path as the risk score of the attack semantic path.
[0080] Here, the risk score for attacking semantic paths can be expressed as: Risk Score = .
[0081] In a specific example, the attenuation coefficient can be 0.7(L-1). The attack semantic path can be represented as: Vulnerability - [:affects] -> Product - [:runs_on] -> Asset. The risk contribution value of the vulnerability node is 9, the product node's risk contribution value is 8, and the asset node's risk contribution value is 10. The weight coefficient for the influence (affects) relationship is 0.8 (i.e., the vulnerability node's weight coefficient is 0.8), the weight coefficient for the run (runs_on) relationship is 0.3 (i.e., the product node's weight coefficient is 0.3), and the weight coefficient for the final asset node is 1. Therefore, the weighted sum of the risk contribution values and weight coefficients of all nodes in this attack semantic path is 19.6 (i.e., 9 × 0.8 + 8 × 0.3 + 10 × 1). This attack semantic path includes two edges, L=2, so the attenuation coefficient is 0.7, and the final risk score is 19.6 × 0.7 = 13.72. In this example, the risk score of the attack semantic path can be calculated based on the risk contribution value of the node, the relational weight of the edge, and the correspondence between the path length and the attenuation coefficient, which are determined in advance based on experience. The risk score can be used to more accurately and reasonably characterize the risk level of the attack semantic path.
[0082] In one example of this implementation, it can be achieved through Figure 6 The process shown determines topology risk information, such as Figure 6 As shown, the process may include the following steps.
[0083] Step 610: Extract the connection density, clustering coefficient, and location features of the node from the security risk knowledge graph as the structural information of the node.
[0084] Step 620: Based on structural information, determine the degree centrality, betweenness centrality, and critical asset reachability of nodes.
[0085] Step 630: Based on the predetermined weight information, determine the weighted sum of the degree centrality, betweenness centrality and key asset accessibility of the node, as the topological risk information of the node.
[0086] In this example, the degree centrality of a node can characterize the number of connections to an asset, reflecting the size of its exposure surface. Betweenness centrality can characterize the criticality of an asset in an attack path. Critical asset reachability can characterize the number of accessible high-value assets.
[0087] As an example, topological risk information can be calculated using the following formula: Topological Risk Information = A × Degree Centrality + B × Betweenness Centrality + C × Key Asset Accessibility. Where A, B, and C represent weight information; for example, A = 0.4, B = 0.4, and C = 0.2.
[0088] In this example, key hub nodes and potential attack springboards in the security risk knowledge graph can be identified based on the node's connection density, clustering coefficient, and location characteristics. Topological risk information can be generated to ensure the risk level, providing more reasonable and accurate data support for subsequent risk assessment.
[0089] Figure 7 A flowchart illustrating the generation of a risk assessment context in one embodiment of the risk assessment method of this disclosure is shown, such as... Figure 7 As shown, step 320 above may include the following steps.
[0090] Step 710: Using the node corresponding to the host to be evaluated as the center, perform a graph traversal within a preset range in the security risk knowledge graph to extract risk entity information.
[0091] As an example, the preset range can be 2 hops. Centered on the node corresponding to the host to be evaluated, the Cypher algorithm is used to traverse the graph within a 2-hop range to extract risk entity information such as vulnerabilities, insecure configurations, weak passwords, reachable critical assets, and security events.
[0092] Step 720: Based on preset aggregation rules, aggregate the extracted risk entity information to obtain one or more types of risk lists.
[0093] In this example, the risk entity information extracted in step 710 can be used to calculate the corresponding risk list according to the corresponding aggregation rules.
[0094] For example, high-risk vulnerability counts and critical configuration problem counts can be calculated based on vulnerability aggregation rules; exploitable vulnerability labels can be created; and the frequency of recent events and high-frequency attack types can be calculated based on security posture aggregation rules.
[0095] Step 730: Based on risk entity information, path risk information, topology risk information and risk list, generate structured data to obtain risk assessment context.
[0096] As an example, risk entity information, path risk information, topology risk information, and risk list can be generated into JSON format data for input into a large language model.
[0097] In this example, risk entity information can be processed according to pre-defined aggregation rules to obtain various types of risk lists, thereby representing risk information in multiple dimensions. The output is structured data containing risk entity information, path risk information, topology risk information and risk lists. This can provide multi-dimensional input data for large language models and provide more comprehensive and information-rich data support for large language models, thereby improving the accuracy and reliability of risk assessment.
[0098] This disclosure also provides a non-transient computer storage medium storing a computer program, in some embodiments of which the computer program, when executed by a processor, implements the risk assessment method of any of the above embodiments.
[0099] It will be understood by those skilled in the art that all or some of the steps, systems, or apparatuses disclosed above, and their functional modules / units, can be implemented as software, firmware, hardware, or suitable combinations thereof. In hardware implementations, the division between functional modules / units mentioned above does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed collaboratively by several physical components. Some or all components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit (ASIC). Such software may be distributed on a computer-readable medium, which may include computer storage media (or non-transitory media) and communication media (or transient media). As is known to those skilled in the art, the term computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computer. Furthermore, it is well known to those skilled in the art that communication media typically contain computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.
Claims
1. A risk assessment method, characterized in that, The method includes: Obtain security data related to the host to be evaluated; The security data is processed using a knowledge graph to generate a risk assessment context; Using a pre-trained large language model, reasoning is performed based on the risk assessment context to output the risk assessment result of the host to be assessed.
2. The method according to claim 1, characterized in that, Obtain security data related to the host to be evaluated, including: Obtain external data related to the host to be evaluated. The external data includes at least one of the following: vulnerability feature data, vulnerability exploitability information, vulnerability remediation priority, associated attack techniques, security configuration baseline verification items and remediation plans, weak password patterns and password policy violation characteristics, and threat intelligence metadata. Monitor the security status of the network environment in which the host to be evaluated is located, and obtain internal data related to the host to be evaluated. The internal data includes at least one of the following: asset attribute information, vulnerability scan results, network connection topology, security event logs, security configuration compliance, and weak password detection. The external data is fused with the internal data to obtain the security data.
3. The method according to claim 2, characterized in that, The security data is processed using a knowledge graph to generate a risk assessment context, including: Based on the security data, a security risk knowledge graph is constructed; Multi-dimensional risk information is extracted from the security risk knowledge graph and aggregated into a structured risk assessment context.
4. The method according to claim 3, characterized in that, The security risk knowledge graph is constructed in the following manner: Predetermined ontology information is extracted from the security data, and nodes representing the ontology are determined. The types of nodes include at least one of the following: asset nodes, vulnerability nodes, product nodes, configuration nodes, weak password nodes, security event nodes, and network connection nodes. Extract predetermined risk relationships from the security data and determine the edges that characterize the risk relationships. The risk relationships include at least one of the following: vulnerabilities affect products or configurations, assets have vulnerabilities, assets have insecure configurations, assets have weak password risks, security events are triggered by specific vulnerabilities or attacks, network connection relationships between assets, and products run on assets. Based on the identified nodes and edges, the security risk knowledge graph is generated.
5. The method according to claim 3, characterized in that, Extract multi-dimensional risk information from the security risk knowledge graph, including at least one of the following: Based on the predetermined attack semantic paths, the attack semantic paths related to the host to be evaluated are extracted from the security risk knowledge graph, and the risk score of each attack semantic path is determined to obtain path risk information. Based on graph theory algorithms, structural information of nodes is extracted from the security risk knowledge graph, and topological risk information is determined based on the structural information.
6. The method according to claim 5, characterized in that, The attack semantic path includes at least one of the following: An attack exploitation path is formed by connecting vulnerability nodes, product nodes, and asset nodes sequentially through corresponding edges. A lateral movement path is formed by connecting different asset nodes and weak password nodes through corresponding edges, wherein the different asset nodes are interconnected and point to the weak password node. An event tracing path is formed by connecting security event nodes, vulnerability nodes, and asset nodes through corresponding edges, where both security event nodes and asset nodes point to vulnerability nodes; The configuration risk path is formed by connecting asset nodes, configuration nodes, and vulnerability nodes through corresponding edges, where both asset nodes and vulnerability nodes point to configuration nodes.
7. The method according to claim 5, characterized in that, Determine the risk score for each attack semantic path, including: Based on the pre-determined risk contribution value of each node, determine the risk contribution value of each node contained in the attack semantic path; Based on a predetermined correspondence between path length and attenuation coefficient, the attenuation coefficient of the attack semantic path is determined, wherein the attenuation coefficient is negatively correlated with the path length. Based on the predetermined relation weights, the weight coefficient of each node contained in the attack semantic path is determined, wherein the weight coefficient of non-terminal nodes is the weight coefficient corresponding to the risk relation represented by the edge introduced by the node, and the weight coefficient of terminal nodes is 1. Determine the weighted sum of the risk contribution values and weight coefficients of all nodes contained in the attack semantic path; The product of the weighted sum and the attenuation coefficient of the attack semantic path is determined as the risk score of the attack semantic path.
8. The method according to claim 5, characterized in that, Based on graph theory algorithms, structural information of nodes is extracted from the security risk knowledge graph, and topological risk information is determined based on the structural information, including: The connection density, clustering coefficient, and location features of nodes are extracted from the security risk knowledge graph to serve as the structural information of the node. Based on the structural information, the degree centrality, betweenness centrality, and accessibility of key assets of the nodes are determined. Based on predetermined weight information, the weighted sum of a node's degree centrality, betweenness centrality, and accessibility to key assets is determined as the node's topological risk information.
9. The method according to claim 5, characterized in that, Multi-dimensional risk information is extracted from the security risk knowledge graph and aggregated into a structured risk assessment context, including: Centered on the node corresponding to the host to be evaluated, a graph traversal within a preset range is performed in the security risk knowledge graph to extract risk entity information; Based on preset aggregation rules, the extracted risk entity information is aggregated to obtain one or more types of risk lists; Based on the risk entity information, the path risk information, the topology risk information, and the risk list, structured data is generated to obtain the risk assessment context.
10. The method according to claim 1, characterized in that, Using a pre-trained large language model, reasoning is performed based on the risk assessment context to output the risk assessment result for the host to be assessed, including: Based on the aforementioned risk assessment context, multi-dimensional assessment prompts are generated; The assessment prompts and the risk assessment context are input into the large speech model to obtain the risk assessment result.
11. A non-transient computer storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the risk assessment method as described in any one of claims 1 to 10.