Artificial intelligence driven cloud service intelligent operation and maintenance security risk dynamic evaluation method and system

By capturing full operational data in real time and using a risk cause tracing model to generate a self-evolving set of assessment dimensions, the shortcomings of traditional cloud service operation and maintenance security risk assessment methods are addressed. This enables dynamic adjustment and accurate security risk response, thereby improving the operational efficiency and security of cloud services.

CN122268680APending Publication Date: 2026-06-23SHANGHAI QINGBIAO INFORMATION TECH SERVICE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI QINGBIAO INFORMATION TECH SERVICE CO LTD
Filing Date
2026-05-25
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Traditional cloud service operation and maintenance security risk assessment methods cannot capture changes in the cloud service operating environment in a timely manner, make it difficult to fully explore the complex correlation between risk triggers and security risk events, and generate response strategies that lack dynamic adaptability. This results in insufficient accuracy and timeliness of assessments, making it impossible to effectively cope with complex and ever-changing cloud service operation and maintenance security risks.

Method used

By capturing full operational data in real time, using a risk cause tracing model to mine the correlation between data types and historical security risk events, generating a self-evolving set of assessment dimensions, simulating changes in the operational status of cloud services under the influence of risk causes, and dynamically adjusting security risk response strategies.

Benefits of technology

It improves the accuracy and timeliness of risk assessment, enables dynamic adaptation of security risk response strategies to risk evolution trends, effectively addresses complex and ever-changing cloud service operation and maintenance security risks, and ensures the stable and secure operation of cloud services.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122268680A_ABST
    Figure CN122268680A_ABST
Patent Text Reader

Abstract

This invention provides a dynamic assessment method and system for cloud service intelligent operation and maintenance security risks based on artificial intelligence, belonging to the field of artificial intelligence technology. First, it captures all operational and maintenance data of cloud services in real time to form a data set covering various key data types. Then, it inputs the data set into a risk causal tracing model to mine the causal correlation between the data and historical security risk events, generating tracing results. Based on the tracing results, it drives the self-evolutionary adjustment of security risk assessment dimensions, generating a self-evolutionary assessment dimension set adapted to the current risk characteristics. Next, it inputs the data set and tracing results into this evolutionary assessment dimension set to simulate changes in the cloud service's operational state, obtaining risk evolution simulation results. Finally, based on the risk evolution simulation results and the current operational resource status, it generates a response strategy that coexists with the risk evolution trend and sends it to the operation and maintenance execution system. This invention can dynamically adapt to changes in cloud services, ensuring the stable and secure operation of cloud services.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and more specifically, to a dynamic assessment method and system for security risks of intelligent operation and maintenance of cloud services driven by artificial intelligence. Background Technology

[0002] In the current field of cloud service operations and maintenance, with the widespread application of cloud computing technology, the scale and complexity of cloud services are increasing daily, and the operational security risks they face are becoming more diverse and insidious. Traditional cloud service operation and maintenance security risk assessment methods mainly rely on pre-defined fixed assessment dimensions and rules, and assess risks by periodically collecting and analyzing partial operation and maintenance data. However, these methods have many limitations.

[0003] On the one hand, fixed assessment dimensions are difficult to adapt to the ever-changing operating environment and risk characteristics of cloud services. During the operation of cloud services, the node status, service interaction mode, external access situation and resource scheduling strategy are constantly changing dynamically, and new risk factors are constantly emerging. Pre-set assessment dimensions cannot capture these changes in a timely manner, resulting in a significant reduction in the accuracy and timeliness of risk assessment.

[0004] On the other hand, traditional methods, based on limited data analysis, struggle to comprehensively and deeply uncover the complex relationships between risk triggers and security risk events. The operational data generated during cloud service operation is diverse and massive in quantity; traditional methods cannot fully utilize this comprehensive data, thus failing to accurately identify potential risk triggers or effectively predict the evolution of risks.

[0005] Furthermore, traditional methods for generating security risk response strategies lack dynamic adaptability to the evolving trends of risks. Because they cannot accurately simulate changes in the operational status of cloud services under the influence of risk triggers, the resulting response strategies are often too rigid and cannot be flexibly adjusted according to real-time changes in risks, making it difficult to effectively address the complex and ever-changing security risks in cloud service operations and maintenance. Summary of the Invention

[0006] In view of the aforementioned problems, and in conjunction with the first aspect of the present invention, embodiments of the present invention provide a dynamic assessment method for security risks of intelligent operation and maintenance of cloud services based on artificial intelligence, the method comprising:

[0007] Real-time capture of all operation and maintenance data during cloud service operation, forming a cloud service full operation and maintenance data set, which includes cloud service node operation data, service interaction data, external access request data, resource scheduling data, and risk warning data;

[0008] The full set of cloud service operation and maintenance data is input into the risk cause tracing model to explore the cause correlation between different operation and maintenance data types and historical security risk events, and generate cloud service intelligent operation and maintenance risk cause tracing results.

[0009] Based on the source tracing results of the cloud service intelligent operation and maintenance risk causes, the security risk assessment dimensions are driven to self-evolve and adjust, generating a set of self-evolving assessment dimensions that are adapted to the current cloud service risk cause characteristics. The set of self-evolving assessment dimensions includes the risk cause impact breadth dimension, the risk cause latency duration dimension, and the risk cause elimination difficulty dimension.

[0010] The full set of cloud service operation and maintenance data and the source tracing results of cloud service intelligent operation and maintenance risk factors are input into the set of self-evolution assessment dimensions to simulate the changes in the operating status of cloud services under the influence of different risk factors, and obtain the simulation results of cloud service risk evolution.

[0011] Based on the cloud service risk evolution simulation results and the current operating resource status of the cloud service, a cloud service intelligent operation and maintenance security risk response strategy that coexists with the risk evolution trend is generated. The cloud service intelligent operation and maintenance security risk response strategy is sent to the cloud service operation and maintenance execution system to trigger operation and maintenance operations.

[0012] Furthermore, embodiments of the present invention also provide a cloud service intelligent operation and maintenance security risk dynamic assessment system driven by artificial intelligence, comprising:

[0013] A processor; a machine-readable storage medium for storing machine-executable instructions of the processor; wherein the processor is configured to execute the aforementioned AI-driven intelligent operation and maintenance security risk dynamic assessment method for cloud services via executing the machine-executable instructions.

[0014] In another aspect, embodiments of the present invention also provide a computer program product, the computer program product including machine-executable instructions, the machine-executable instructions being stored in a computer-readable storage medium, the processor of a computer device reading the machine-executable instructions from the computer-readable storage medium, the processor executing the machine-executable instructions, causing the computer device to execute the above-described dynamic assessment method for security risks of intelligent operation and maintenance of cloud services based on artificial intelligence.

[0015] Based on the above, by capturing all operational and maintenance data during the real-time operation of cloud services, a complete set of cloud service operational and maintenance data is formed, including cloud service node operation data, service interaction data, external access request data, resource scheduling data, and risk trigger precursor data. Then, a risk trigger tracing model is used to mine the correlation between different operational and maintenance data types and the triggers of historical security risk events, generating cloud service intelligent operation and maintenance risk trigger tracing results. This can accurately identify potential risk triggers. Based on the tracing results, the security risk assessment dimensions are driven to self-evolve and adjust, generating a set of self-evolving assessment dimensions that are adapted to the current cloud service risk trigger characteristics. This allows the assessment dimensions to dynamically adapt to changes in cloud services, improving the accuracy and timeliness of risk assessment. By inputting the full set of operation and maintenance data and the traceability results into the self-evolutionary assessment dimension set, the operational status changes of cloud services under different risk triggers are simulated to obtain cloud service risk evolution simulation results. This allows for the prediction of risk development trends in advance. Finally, based on the risk evolution simulation results and the current operational resource status of the cloud service, a cloud service intelligent operation and maintenance security risk response strategy that coexists with the risk evolution trend is generated and sent to the operation and maintenance execution system to trigger operation and maintenance operations. This achieves dynamic adaptation between the security risk response strategy and the risk evolution trend, effectively addressing complex and ever-changing cloud service operation and maintenance security risks, ensuring the stable and secure operation of cloud services, and improving the overall efficiency and security of cloud service operation and maintenance. Attached Figure Description

[0016] Figure 1 This is a schematic diagram of the execution flow of the AI-driven intelligent operation and maintenance security risk dynamic assessment method for cloud services provided in this embodiment of the invention.

[0017] Figure 2 This is a schematic diagram of exemplary hardware and software components of the AI-driven intelligent operation and maintenance security risk dynamic assessment system for cloud services provided in this embodiment of the invention. Detailed Implementation

[0018] The present invention will now be described in detail with reference to the accompanying drawings. Figure 1 This is a flowchart illustrating a dynamic assessment method for security risks in intelligent cloud service operation and maintenance based on artificial intelligence, provided in one embodiment of the present invention. The following is a detailed description of this dynamic assessment method for security risks in intelligent cloud service operation and maintenance based on artificial intelligence.

[0019] Step S110: Capture all operation and maintenance data during the operation of cloud services in real time to form a cloud service full operation and maintenance data set. The cloud service full operation and maintenance data set includes cloud service node operation data, service interaction data, external access request data, resource scheduling data, and risk warning data.

[0020] In this embodiment, an enterprise-level cloud service platform operation and maintenance scenario is used as an example. This cloud service platform includes multiple physical server nodes, various application services running on the nodes, and network components connecting the services. When capturing full operation and maintenance data in real time, a distributed data acquisition architecture is adopted. A data acquisition agent is deployed on each physical server node, and traffic monitoring probes are deployed on key network nodes for service interaction. External access request data is collected through the cloud service platform's access gateway, resource scheduling data is pushed in real time by the cloud service platform's resource management center, and risk warning data is obtained through log analysis plugins deployed in each service instance.

[0021] Cloud service node operation data includes, but is not limited to, CPU operation status data, memory usage status data, disk read / write status data, and network interface transmission status data for each node; service interaction data includes service call request data, call response data, call error information data, and call tracing data; external access request data includes access source identifier data, requested resource path data, request parameter data, request timestamp data, and request processing result data; resource scheduling data includes CPU resource allocation data, memory resource allocation data, storage resource allocation data, network bandwidth allocation data, and resource migration record data; and risk trigger warning data includes service anomaly log data, system error message data, resource usage threshold alarm data, and network connection anomaly data.

[0022] During data capture, sensitive data involving user privacy, such as the user's real IP address that may be contained in the access source identifier data of external access requests, is processed using IP address de-identification technology, converting the user's real IP address into an anonymous identifier after hashing. User identity information that may be contained in service interaction data is encrypted and stored using data encryption technology, with the encryption key employing a dynamic management mechanism and being changed periodically. Simultaneously, all collected data is transmitted through encrypted channels to prevent theft or tampering during transmission.

[0023] Step S120: Input the full set of cloud service operation and maintenance data into the risk cause tracing model, mine the cause correlation between different operation and maintenance data types and historical security risk events, and generate cloud service intelligent operation and maintenance risk cause tracing results.

[0024] Step S121: Divide the full set of cloud service operation and maintenance data into subsets of cloud service node operation data, service interaction data, external access request data, resource scheduling data, and risk precipitant data according to data type, and label each data subset with the corresponding data collection time information.

[0025] In this embodiment, when dividing the full set of cloud service operation and maintenance data, it is classified according to the data's source and business attributes. The cloud service node operation data subset includes node-level operation data such as CPU operation status data and memory usage status data collected from each physical server node, and each data record is marked with the precise timestamp when it was obtained from the node collection agent program; the service interaction data subset includes data related to calls between services, and is marked with the time information of the service call; the external access request data subset includes all external request data collected by the access gateway, and is marked with the timestamp of the request arriving at the gateway; the resource scheduling data subset includes various resource allocation and migration data pushed by the resource management center, and is marked with the execution time of the resource scheduling operation; the risk precipitant data subset includes abnormal data obtained by the log analysis plugin of each service instance, and is marked with the time when the log was generated.

[0026] Step S122: Obtain historical security risk event data, decompose the historical security risk event data, and extract the operation and maintenance data before the event occurred, the operation and maintenance data when the event occurred, and the operation and maintenance data after the event was handled for each historical security risk event to form a historical event operation and maintenance data sequence.

[0027] In this embodiment, historical security risk event data is obtained from the historical security event database of the cloud service platform. This database stores records of various security risk events that occurred over a period of time, including event number data, event name data, event type data, event occurrence time data, event impact scope data, and event handling process data. For each historical security risk event, it is broken down according to the timeline of event development. The full cloud service operation and maintenance data within a preset time period before the event occurs is extracted as the pre-event operation and maintenance data. The full cloud service operation and maintenance data during the event occurrence and its duration is extracted as the event-time operation and maintenance data. The full cloud service operation and maintenance data within a preset time period after the event is handled is extracted as the post-event operation and maintenance data. Then, these three parts of data are combined in chronological order to form the historical event operation and maintenance data sequence corresponding to the historical security risk event.

[0028] Step S123: Input each subset of the divided cloud service full operation and maintenance data set and the historical event operation and maintenance data sequence into the risk cause tracing model, and compare the feature overlap between each data subset and the operation and maintenance data before the event in the historical event operation and maintenance data sequence.

[0029] Step S1231: Extract data features from each subset of the full cloud service operation and maintenance data set. Extract CPU utilization change features, memory usage change features, and disk I / O change features from the cloud service node running data subset; extract service call frequency change features, service response time change features, and service error code occurrence features from the service interaction data subset; extract access IP distribution features, access request type distribution features, and access frequency change features from the external access request data subset; extract resource allocation ratio change features, resource migration frequency features, and resource load change features from the resource scheduling data subset; and extract abnormal data occurrence frequency features, data value fluctuation amplitude features, and data correlation change features from the risk trigger precursor data subset.

[0030] In this embodiment, the risk cause tracing model includes a data preprocessing module, a feature extraction module, and a feature comparison module. During data feature extraction, the CPU utilization change characteristics within the cloud service node running data subset are obtained by analyzing the trend, rate, and magnitude of CPU utilization changes per unit time; memory usage change characteristics are obtained by analyzing the increase / decrease trend of memory usage, memory usage growth rate, and memory paging frequency; and disk I / O change characteristics are obtained by analyzing the changes in the number of disk read / write operations, the amount of data read / written, and the read / write response time.

[0031] In the service interaction data subset, the service call frequency variation characteristics are obtained by statistically analyzing the number of calls between services within a unit of time and the trend of increase or decrease in the number of calls; the service response time variation characteristics are obtained by analyzing the changes in statistical quantities such as the average, maximum, minimum, and variance of service call response times; and the service error code occurrence characteristics are obtained by statistically analyzing the number of occurrences, frequency, and time distribution characteristics of different types of error codes.

[0032] In the subset of external access request data, the distribution characteristics of access IPs are obtained by analyzing the proportion of access requests initiated by different IP address ranges and the geographical distribution characteristics of access IPs (after privacy processing, only regional level information is retained); the distribution characteristics of access request types are obtained by statistically analyzing the proportion of requests using different request methods (such as GET, POST, etc.); and the characteristics of access frequency changes are obtained by statistically analyzing the total number of access requests per unit time and the trend of changes in the number of access requests.

[0033] In the resource scheduling data subset, the characteristics of resource allocation ratio changes are obtained by analyzing the changes in the allocation ratio of various resources (CPU, memory, storage, etc.) among different services or nodes; the characteristics of resource migration frequency are obtained by statistically analyzing the number of resource migrations and the scale of migrated resources per unit time; and the characteristics of resource load changes are obtained by analyzing the changes in the load rate and load balancing degree of each resource.

[0034] Within the risk-inducing precursor data subset, the frequency characteristics of abnormal data occurrences are obtained by statistically analyzing the number of times abnormal data records occur and the intervals between occurrences within a unit of time; the fluctuation amplitude characteristics of data values ​​are obtained by analyzing the degree to which abnormal data values ​​deviate from the normal range and the severity of fluctuations; and the characteristics of changes in data correlation are obtained by analyzing changes in the correlation between abnormal data and other relevant data, such as changes in the correlation coefficient.

[0035] Step S1232: Perform the same data feature extraction on the operation and maintenance data before the event in the historical event operation and maintenance data sequence to obtain the historical operation and maintenance data feature set before the event.

[0036] In this embodiment, the same feature extraction method and process as in step S1231 are used to extract features from the operation and maintenance data before the event in the historical event operation and maintenance data sequence, and extract various features corresponding to each subset of the full set of cloud service operation and maintenance data, thereby forming a feature set of operation and maintenance data before the historical event.

[0037] Step S1233: After standardizing the feature dimensions and representation forms of the features of each subset of the full cloud service operation and maintenance data set with the features of different sources in the operation and maintenance data feature set before the occurrence of historical events, calculate the matching degree between the features of each subset of the full cloud service operation and maintenance data set after feature dimension standardization and the features of each feature in the operation and maintenance data feature set before the occurrence of historical events. The matching degree is obtained by comprehensively considering the consistency of the trend of feature value change, the consistency of feature occurrence time interval, and the consistency of feature correlation.

[0038] In this embodiment, since the characteristics of each subset of the full cloud service operation and maintenance data set may differ from the operation and maintenance data feature set before the historical event, in terms of the number of feature dimensions and the feature representation method, feature dimension standardization transformation is required. For example, for cases with different numbers of feature dimensions, feature mapping is used to map the side with fewer feature dimensions to the dimension space of the side with more feature dimensions; for cases with different feature representation methods, a unified feature description specification is used to unify the feature representation methods.

[0039] When calculating the matching degree, the consistency of feature value change trends is judged by comparing whether the increase or decrease trends of two features in the time series are consistent. If the increase or decrease trends of two features are the same, the consistency is high. The consistency of feature occurrence time intervals is judged by comparing whether the time intervals of the occurrence of key feature points in two features are similar. If the time interval difference is small, the consistency is high. The consistency of feature association is judged by analyzing whether the association between two features and other related features is similar. If the association is similar, the consistency is high. The consistency of these three aspects is comprehensively considered. By setting weight coefficients for each aspect, the consistency degree of each aspect is weighted and summed to obtain the final matching degree.

[0040] Step S1234: For each data subset, count the number of features in the data subset whose matching degree with the operation and maintenance data before the historical event exceeds the preset matching threshold, and calculate the ratio of the number of matching features to the total number of features in the data subset. This ratio is used as the feature overlap between each data subset and the operation and maintenance data before the historical event.

[0041] In this embodiment, the preset matching threshold is set based on the feature analysis results of historical security risk events and actual operation and maintenance experience. For each data subset, all extracted features are traversed, and the matching degree of each feature is calculated with the corresponding feature in the operation and maintenance data feature set before the historical event. The number of features with a matching degree exceeding the preset matching threshold is counted. Then, the number of matching features is divided by the total number of features in the data subset, and the resulting ratio is the feature overlap between the data subset and the operation and maintenance data before the historical event.

[0042] Step S1235: Record the feature overlap calculation process for each data subset, including the basis for feature extraction, feature alignment method, matching degree calculation logic, and feature overlap result, forming a feature overlap calculation record.

[0043] In this embodiment, during the feature overlap calculation process, the original data fields, extraction algorithm logic, and feature value calculation methods used for feature extraction of each data subset are recorded in detail. The feature alignment methods, such as feature mapping rules and dimension unification methods, are also recorded during the feature dimension standardization transformation process. The matching degree calculation logic, including the weight coefficient settings for each consistency aspect and the weighted summation method, is recorded. Finally, the feature overlap results for each data subset are obtained. This information is organized according to a preset format to form a feature overlap calculation record, which allows for subsequent traceability and verification of the calculation process.

[0044] Step S124: For a subset of data whose feature overlap exceeds a preset overlap threshold, further extract data items in the subset that are consistent with the operation and maintenance data before the historical event occurred, and mark the data items as potential risk trigger data items.

[0045] In this embodiment, the preset overlap threshold is set based on the probability and impact of historical security risk events, typically a value that effectively distinguishes normal data from data that may contain risk-inducing factors. When the feature overlap of a data subset exceeds the preset overlap threshold, it indicates that the data subset has a high similarity to the operation and maintenance data characteristics before the historical security risk event, and may contain data that could have triggered the security risk event. In this case, a deeper analysis of the data subset is performed to extract data items that match the operation and maintenance data characteristics before the historical event. For example, if the operation and maintenance data before the historical event shows an abnormally high CPU utilization, and the same abnormally high CPU utilization data item exists in this data subset, then this data item is marked as a potential risk-inducing factor data item.

[0046] Step S125: Based on the collection time information of potential risk factor data items, organize the occurrence order of potential risk factor data items during the operation of cloud services, and construct the time series relationship of potential risk factor data items.

[0047] In this embodiment, each potential risk factor data item is labeled with its collection time information. These potential risk factor data items are sorted according to their collection time to determine their position on the timeline. Then, the time intervals between adjacent potential risk factor data items and the occurrence patterns of different types of potential risk factor data items are analyzed to construct a time series relationship between the potential risk factor data items. This time series relationship reflects the dynamic changes of potential risk factor data items during cloud service operation.

[0048] Step S126: Combining the correspondence between event causes and event consequences in historical security risk event data, label the corresponding event impact type for each potential risk cause data item, integrate the time series relationship and event impact type, and generate cloud service intelligent operation and maintenance risk cause tracing results that include potential risk cause data items, time series relationship, and event impact type.

[0049] For example, step S1261: Analyze the correspondence between event causes and event consequences in historical security risk event data, and establish a correspondence table between event cause types and event impact types. The correspondence table clarifies all possible event impact types that each event cause type may cause.

[0050] In this embodiment, the event causes and consequences in historical security risk event data are categorized and organized. Event causes are classified into different event cause types, such as resource overload, network attack, software defect, and configuration error; event consequences are classified into different event impact types, such as service performance degradation, service unavailability, data leakage, and system crash. By analyzing the correspondence between event cause types and event impact types in a large number of historical security risk events, a correspondence table is established. In this table, each event cause type corresponds to one or more possible event impact types.

[0051] Step S1262: Analyze the data item characteristics of each potential risk trigger data item, and determine the event trigger type to which the potential risk trigger data item belongs based on the data item characteristics. The determination criteria include the source of the data item, the change characteristics of the data item, and the way the data item affects the cloud service.

[0052] In this embodiment, the source of a data item refers to the subset of data from which the potential risk trigger data item was extracted; different subsets may correspond to different event trigger types. The change characteristics of a data item include the trend, magnitude, and frequency of change of its value; different event trigger types may lead to different change characteristics in the data item. The way a data item affects cloud services refers to which components or functions of the cloud service it may affect, as well as the manner and extent of that impact. By comprehensively analyzing the above data item characteristics and comparing them with the typical characteristics of event trigger types, the event trigger type to which the potential risk trigger data item belongs can be determined.

[0053] Step S1263: According to the correspondence table between event cause type and event impact type, find the event impact type corresponding to the event cause type to which the potential risk cause data item belongs, and label the event impact type to the potential risk cause data item.

[0054] In this embodiment, after determining the event trigger type to which the potential risk trigger data item belongs, the corresponding event trigger type and event impact type correspondence table is queried to find all event impact types corresponding to the event trigger type. The above event impact types are used as the possible impact types that the potential risk trigger data item may cause, and are marked on the data item.

[0055] Step S1264: If a potential risk trigger data item corresponds to multiple event impact types, then according to the occurrence probability of each event impact type in the historical security risk event data, mark the probability of occurrence for each event impact type.

[0056] In this embodiment, when an event trigger type corresponding to a potential risk trigger data item corresponds to multiple event impact types, the probability of occurrence of each event impact type is calculated by analyzing the actual occurrence frequency of each event impact type under that event trigger type in historical security risk event data. For example, if the event trigger type has occurred N times in history, and event impact type A has occurred M times, then the probability of occurrence of event impact type A is M / N. The calculated probability of occurrence is used as the likelihood of occurrence of that event impact type and marked on the potential risk trigger data item.

[0057] Step S1265: Organize the collection time information of all potential risk factor data items, and construct the time series relationship of potential risk factor data items according to the order of collection time. The time series relationship specifies the collection time point of each data item and the time interval between adjacent data items.

[0058] In this embodiment, all potential risk factor data items are arranged on a timeline according to their collection time information to determine the specific collection time point of each data item. Then, the difference between the collection time points of two adjacent data items is calculated to obtain the time interval between adjacent data items. By organizing the above collection time points and time interval information, a time series relationship of potential risk factor data items is constructed, which can clearly show the distribution of potential risk factor data items in the time dimension.

[0059] Step S1266: Integrate the potential risk trigger data items, the event impact type and probability of occurrence corresponding to each data item, and the time series relationship of the potential risk trigger data items to form a structured cloud service intelligent operation and maintenance risk trigger tracing result.

[0060] In this embodiment, the specific content of potential risk-causing data items, the event impact type and probability of occurrence corresponding to each data item, and the constructed time series relationship are integrated according to a preset structured format to form the cloud service intelligent operation and maintenance risk-causing source tracing result. This cloud service intelligent operation and maintenance risk-causing source tracing result can be organized in the form of data dictionary, linked list, tree structure, etc., so that subsequent steps can easily parse and utilize it.

[0061] Step S130: Based on the source tracing results of the cloud service intelligent operation and maintenance risk causes, drive the security risk assessment dimensions to perform self-evolution adjustment, and generate a set of self-evolution assessment dimensions that are adapted to the current cloud service risk cause characteristics. The set of self-evolution assessment dimensions includes the risk cause impact breadth dimension, the risk cause latency duration dimension, and the risk cause elimination difficulty dimension.

[0062] Step S131: Obtain the initial security risk assessment dimension framework. The initial security risk assessment dimension framework includes the risk impact scope dimension, the risk occurrence probability dimension, and the risk handling cost dimension. Each dimension has corresponding assessment indicators.

[0063] In this embodiment, the initial security risk assessment framework is pre-built based on general cloud service security risk assessment theories and industry best practices. The risk impact scope dimension assesses the number of cloud service components, users, and business scope that a security risk event may affect. Its assessment metrics include the number of affected components, the scale of user impact, and the duration of business interruption. The risk occurrence probability dimension assesses the likelihood of a security risk event occurring. Its assessment metrics include historical occurrence frequency, the probability of trigger occurrence, and system vulnerability. The risk handling cost dimension assesses the resource costs required to handle a security risk event. Its assessment metrics include human resource costs, material resource costs, time costs, and business loss costs.

[0064] Step S132: Analyze the cloud service intelligent operation and maintenance risk causation tracing results, extract the potential risk causation data items and event impact types, and count the number of potential risk causation data items corresponding to different event impact types.

[0065] In this embodiment, the results of tracing the causes of cloud service intelligent operation and maintenance risks are analyzed to extract all potential risk cause data items and the event impact type corresponding to each data item. Then, the potential risk cause data items are classified and statistically analyzed according to the event impact type, and the number of potential risk cause data items contained under each event impact type is counted. For example, the service performance degradation impact type corresponds to 5 potential risk cause data items, and the service unavailability impact type corresponds to 3 potential risk cause data items, etc.

[0066] Step S133: Based on the number of potential risk trigger data items corresponding to different event impact types, determine the main risk impact types in the current cloud service operation process, associate the main risk impact types with the dimensions in the initial security risk assessment dimension framework, and determine whether the initial dimensions can cover the main risk impact types.

[0067] In this embodiment, the number of potential risk trigger data items corresponding to different event impact types is sorted, and the top few event impact types with the largest number are selected as the main risk impact types in the current cloud service operation. Then, the characteristics and connotations of these main risk impact types are analyzed to see if they can be covered by the risk impact scope dimension, risk occurrence probability dimension, and risk handling cost dimension in the initial security risk assessment dimension framework. For example, the service performance degradation impact type mainly involves the risk impact scope and risk handling cost, which may be covered by the initial dimensions; while some new event impact types may not be fully covered by the initial dimensions.

[0068] Step S134: If the initial dimensions cannot cover the main risk impact types, add an assessment dimension corresponding to the uncovered risk impact types. The name of the new dimension matches the uncovered risk impact type, and the assessment indicators under the new dimension are set according to the characteristics of the potential risk trigger data items.

[0069] Step S1341: Analyze the types of risks that are not covered, determine the specific ways in which the risk impacts the operation of cloud services, the objects affected, and the consequences of the impact, and determine the name of the new assessment dimension based on the analysis results.

[0070] In this embodiment, risk impact types not covered by the initial dimensions are analyzed in depth to understand how they affect the operation of cloud services (mode of action), which specific components or services of the cloud services are affected (object of action), and the possible consequences (effects of action). Based on these analysis results, a name is determined for the new evaluation dimension that can accurately reflect the characteristics of the risk impact type. For example, if the uncovered risk impact type is one where the risk trigger has a long latency period in the cloud service and is difficult to detect in a timely manner, the new evaluation dimension can be named the risk trigger latency period dimension.

[0071] Step S1342: Extract potential risk factor data items corresponding to the uncovered risk impact type from the cloud service intelligent operation and maintenance risk factor tracing results, and analyze the core characteristics of the potential risk factor data items, including the collection frequency of the potential risk factor data items, the range of change of the potential risk factor data items, the correlation between the potential risk factor data items and other operation and maintenance data, and the impact path of the potential risk factor data items on cloud service components.

[0072] In this embodiment, potential risk-causing data items corresponding to the uncovered risk impact types are screened from the cloud service intelligent operation and maintenance risk cause tracing results, and the core characteristics of the above data items are analyzed. Collection frequency refers to the number of times the data item is collected per unit time; variation range refers to the fluctuation range of the data item's value; correlation method refers to the causal relationship, accompanying relationship, etc., between the data item and other operation and maintenance data; impact path refers to the intermediate links through which the data item affects cloud service components, for example, by affecting node CPU utilization and thus affecting service response time.

[0073] Step S1343: Based on the core features of the potential risk trigger data items, construct evaluation indicators under the new evaluation dimension. Each evaluation indicator corresponds to a core feature of the potential risk trigger data item. The expression form of the evaluation indicators is consistent with the expression form of the evaluation indicators under the initial dimension.

[0074] In this embodiment, corresponding evaluation indicators are constructed for the newly added evaluation dimensions based on the core characteristics of the potential risk trigger data items. For example, for the risk trigger latency dimension, a latency period indicator is constructed based on the collection frequency characteristics of the potential risk trigger data items; a concealment indicator is constructed based on the variation range characteristics of the data items; a correlation indicator is constructed based on the correlation characteristics between the data items and other operation and maintenance data; and an impact diffusion indicator is constructed based on the impact path characteristics of the data items on cloud service components. Simultaneously, it is ensured that the expression format of these newly added evaluation indicators, such as indicator names, indicator definitions, and indicator calculation methods, is consistent with the expression format of the evaluation indicators under the initial dimensions, so as to facilitate unified subsequent evaluation work.

[0075] Step S1344: Determine the evaluation logic for each newly added evaluation indicator. The evaluation logic specifies the method for calculating the evaluation indicator result through the specific data of the potential risk cause data item. The evaluation logic is constructed in combination with the way the risk impact type affects the cloud service.

[0076] In this embodiment, the evaluation logic refers to the rules and methods for calculating the evaluation indicator results based on the specific data of potential risk trigger data items. For example, for the latency period indicator, the evaluation logic can be calculated based on the time interval between the initial collection time and the discovery time of the potential risk trigger data item; for the concealment indicator, it can be calculated based on whether the variation range of the potential risk trigger data item is within the normal data fluctuation range, and whether it is easily detected by conventional monitoring methods. The construction of the evaluation logic is closely integrated with the way the risk impact type affects cloud services, ensuring that the evaluation indicators can accurately reflect the characteristics and degree of the risk impact type.

[0077] Step S1345: Integrate the new assessment dimensions, new assessment indicators and corresponding assessment logic to form a new dimension module, and integrate the new dimension module into the initial security risk assessment dimension framework.

[0078] In this embodiment, the newly added evaluation dimensions, the evaluation indicators under these dimensions, and the evaluation logic corresponding to each evaluation indicator are integrated according to the structure and format of the initial security risk assessment dimension framework to form an independent new dimension module. Then, this new dimension module is added to the initial security risk assessment dimension framework, together with the original dimensions, to form a new evaluation dimension framework.

[0079] Step S1346: Test the new dimension module by inputting historical security risk event data related to the uncovered risk impact type into the new dimension module to verify whether the new assessment indicators and assessment logic can accurately reflect the actual situation of the risk impact type. If the verification fails, adjust the assessment indicators or assessment logic until the verification passes.

[0080] In this embodiment, data from multiple historical security risk events related to the uncovered risk impact type are selected. This event data is input into the new dimension module, and the events are evaluated using newly added evaluation indicators and logic. The evaluation results are compared with the actual situation of the historical security risk events to analyze the accuracy and reasonableness of the evaluation results. If the evaluation results deviate significantly from the actual situation, it indicates a problem with the newly added evaluation indicators or evaluation logic. The settings of the evaluation indicators or the rules of the evaluation logic need to be adjusted, and the test should be repeated until the evaluation results accurately reflect the actual situation of the risk impact type.

[0081] Step S135: If the initial dimension can cover the main risk impact types, adjust the weight of the assessment indicators under the initial dimension. Based on the proportion of potential risk trigger data items corresponding to the main risk impact types, increase the weight of the assessment indicators corresponding to risk impact types with a proportion greater than the set proportion threshold, and decrease the weight of the assessment indicators corresponding to risk impact types with a proportion not greater than the set proportion threshold.

[0082] In this embodiment, the percentage threshold is set based on the number and importance of the main risk impact types. The percentage is calculated as the proportion of potential risk trigger data items corresponding to each main risk impact type to the total number of potential risk trigger data items corresponding to all main risk impact types. For risk impact types with a percentage greater than the set percentage threshold, the weight of their corresponding assessment indicator in the initial dimension needs to be increased to enhance the importance of that indicator in risk assessment; for risk impact types with a percentage less than the set percentage threshold, the weight of their corresponding assessment indicator is decreased. The magnitude of the weight adjustment is determined based on the difference in the percentage; the larger the percentage, the greater the increase in weight, and vice versa.

[0083] Step S136: Integrate the newly added dimensions or the initial dimensions after weight adjustment with the corresponding evaluation indicators to form a preliminary set of self-evolutionary evaluation dimensions.

[0084] In this embodiment, if a new dimension is added, the new dimension module is integrated with the adjusted initial dimension (if any); if only the weights of the evaluation indicators under the initial dimension are adjusted, the initial dimension with the adjusted weights is integrated. During the integration process, it is ensured that the logical relationships between the dimensions are clear, and that the evaluation indicators are not duplicated or conflicting, thus forming a preliminary set of self-evolving evaluation dimensions.

[0085] Step S137: Apply the preliminary self-evolution assessment dimension set to the latest cloud service operation and maintenance data, calculate the assessment accuracy of the preliminary self-evolution assessment dimension set for the latest partial range of risk events. If the assessment accuracy does not reach the preset accuracy standard, readjust the dimensions or assessment indicator weights until the assessment accuracy reaches the preset standard, and generate the final self-evolution assessment dimension set.

[0086] In this embodiment, the latest cloud service operation and maintenance data refers to the operation and maintenance data generated during the operation of the cloud service in the recent period, which includes some recently occurred partial risk events. A preliminary self-evolving assessment dimension set is applied to this latest operation and maintenance data to assess the partial risk events that have occurred. The assessment results are compared with the actual situation of the events to calculate the assessment accuracy. The preset accuracy standard is set according to the security requirements and risk control objectives of cloud service operation and maintenance. If the assessment accuracy does not reach the preset standard, it is necessary to re-examine whether the dimension settings or the adjustment of the assessment indicator weights are reasonable, add or delete dimensions, or readjust the assessment indicator weights, and then recalculate the assessment accuracy until the assessment accuracy reaches the preset standard. The assessment dimension set obtained at this point is the final self-evolving assessment dimension set.

[0087] Step S140: Input the full set of cloud service operation and maintenance data and the source tracing results of cloud service intelligent operation and maintenance risk causes into the self-evolution assessment dimension set to simulate the changes in the operating status of cloud services under the influence of different risk causes, and obtain the cloud service risk evolution simulation results.

[0088] Step S141: Extract potential risk trigger data items and corresponding event impact types from the cloud service intelligent operation and maintenance risk trigger source tracing results, group the potential risk trigger data items according to the event impact type, and obtain potential risk trigger data groups for different event impact types.

[0089] In this embodiment, potential risk trigger data items and their corresponding event impact types are extracted from the cloud service intelligent operation and maintenance risk trigger tracing results. Then, based on the different event impact types, the potential risk trigger data items are divided into different groups. For example, all potential risk trigger data items corresponding to service performance degradation impact types are grouped into one group to form a service performance degradation potential risk trigger data group; data items corresponding to service unavailability impact types are grouped into one group to form a service unavailability potential risk trigger data group, etc. Each group is a potential risk trigger data group.

[0090] Step S142: Filter out the operation and maintenance data related to each potential risk factor data group from the full set of cloud service operation and maintenance data to form the associated operation and maintenance data subset corresponding to each potential risk factor data group.

[0091] In this embodiment, the source and characteristics of potential risk trigger data items in each potential risk trigger data group are analyzed to determine which operation and maintenance data in the full cloud service operation and maintenance data set are related to these data items. For example, if a potential risk trigger data item comes from the CPU utilization data of a subset of cloud service node running data, the associated operation and maintenance data related to this data item may include the node's memory usage data, disk I / O data, network transmission data, etc. Based on these relationships, operation and maintenance data related to each potential risk trigger data group are filtered from the full cloud service operation and maintenance data set to form the associated operation and maintenance data subset corresponding to each potential risk trigger data group.

[0092] Step S143: Input each potential risk factor data group and its corresponding associated operation and maintenance data subset into the self-evolution evaluation dimension set, and calculate the initial evaluation value of each potential risk factor data group under each evaluation dimension according to the evaluation indicators and evaluation logic of each dimension in the self-evolution evaluation dimension set.

[0093] In this embodiment, the self-evolutionary evaluation dimension set includes multiple evaluation dimensions, each with corresponding evaluation indicators and evaluation logic. For each potential risk trigger data group and its associated operation and maintenance data subset, each evaluation dimension in the self-evolutionary evaluation dimension set is applied sequentially. Under each evaluation dimension, based on the evaluation indicators and evaluation logic of that dimension, combined with the data items in the potential risk trigger data group and the data in the associated operation and maintenance data subset, the evaluation indicator results are calculated to obtain the initial evaluation value of the potential risk trigger data group under that evaluation dimension. For example, under the risk trigger impact breadth dimension, based on the evaluation indicators (such as the component impact quantity indicator) and evaluation logic, the number of cloud service components that the potential risk trigger data group may affect is calculated as the initial evaluation value of that dimension.

[0094] Step S144: Using the initial evaluation value as the initial condition for simulation, set the simulation time range, and simulate the changes in the operating status of each component of the cloud service under the continuous effect of the potential risk trigger data group within the time range, to obtain the cloud service component operating status change data, which includes changes in node operating parameters, changes in service interaction parameters, and changes in resource scheduling parameters.

[0095] Step S1441: Set a corresponding cloud service component operation status baseline value for the initial evaluation value of each evaluation dimension, so that the baseline value can accurately reflect the actual operation status corresponding to the initial evaluation value.

[0096] In this embodiment, the baseline value for the cloud service component's operating status refers to the standard parameter values ​​of each cloud service component when it is operating normally under the risk state corresponding to the initial assessment value. Based on the magnitude of the initial assessment value and the meaning of the assessment dimensions, combined with the performance parameters and operating metrics of the cloud service components, a corresponding baseline value for the cloud service component's operating status is set for the initial assessment value of each assessment dimension. For example, if the initial assessment value for the breadth of the risk factor's impact dimension is 3 (indicating an impact on 3 components), then the corresponding baseline value for the cloud service component's operating status includes the normal CPU utilization range, memory usage range, and service response time range of these 3 components.

[0097] Step S1442: Set the simulation time range. The simulation time range is set in combination with the regular cycle of cloud service operation and maintenance and the impact cycle of potential risk trigger data items, so that the simulation time range can cover the key stages where risks may change.

[0098] In this embodiment, the regular operation and maintenance cycle of cloud services includes daily operation and maintenance cycle, weekly operation and maintenance cycle, monthly operation and maintenance cycle, etc.; the impact cycle of potential risk trigger data items refers to the time from the appearance of the data item to its significant impact on the cloud service, as well as the duration of the impact. Taking these factors into consideration, an appropriate simulation time range is set. This simulation time range should be long enough to cover the risk from its initial state, going through key stages such as development, diffusion, stabilization, or mitigation, in order to comprehensively observe the evolution process of the risk.

[0099] Step S1443: Use the characteristic parameters of the potential risk factor data group as simulation driving parameters to drive the simulation of the cloud service component's operating status. The characteristic parameters include the data item change frequency, the data item influence intensity, and the data item association range.

[0100] In this embodiment, the data item change frequency refers to the frequency with which the values ​​of data items in the potential risk factor data group change; the data item influence intensity refers to the severity of the impact of the data item on the operating status of the cloud service component; and the data item association range refers to the range of other related data items or cloud service components that the data item can affect. These characteristic parameters are used as simulation driving parameters and input into the cloud service component operating status simulation model. The model drives the simulation process of the cloud service component operating status based on changes in these parameters.

[0101] Step S1444: For cloud service node components, simulate the changes in CPU utilization, memory usage, and disk I / O rate under the influence of potential risk factor data groups. The changes are consistent with the operating rules of this type of cloud service node component, and the magnitude of the changes is positively correlated with the influence intensity of the potential risk factor data items.

[0102] In this embodiment, the operational pattern of the cloud service node component refers to the characteristics and range of changes in parameters such as CPU utilization, memory usage, and disk I / O rate under normal conditions. During the simulation, the changes in these parameters are simulated based on the characteristic parameters of the potential risk factor data group, especially the influence intensity of the data items. When the influence intensity of a data item increases, CPU utilization and memory usage may rise, and disk I / O rate may experience abnormal fluctuations or decreases, with the magnitude of change increasing as the influence intensity increases; conversely, when the influence intensity decreases, the magnitude of change also decreases accordingly. During the simulation, the changes in parameters must conform to the hardware performance limitations and software operating logic of the node component.

[0103] Step S1445: For cloud service interaction components, simulate the changes in service call frequency, service response time, and service error code occurrence under the influence of potential risk trigger data groups. The changes are combined with the service interaction logic, and the trend of changes is consistent with the change frequency of potential risk trigger data items.

[0104] In this embodiment, the service interaction logic of the cloud service interaction component includes the call relationship between services, the call protocol, and the data transmission method. When simulating changes in service call frequency, if the potential risk factor data item changes frequently, it may lead to frequent fluctuations or abnormal increases / decreases in the service call frequency. The change in service response time is related to the service call frequency and service processing capacity; when the call frequency is too high or the service processing capacity is affected, the response time will be prolonged. The change in the number of service error codes is related to the service anomalies caused by the potential risk factor data item; the higher the change frequency, the more error codes may occur. The trend of change is consistent with the change frequency of the potential risk factor data item; that is, as the change frequency of the data item increases, the abnormal change trend of the service interaction parameters also strengthens accordingly.

[0105] Step S1446: For the cloud service resource scheduling component, simulate the changes in resource allocation ratio, number of resource migrations, and resource load balancing under the influence of potential risk factor data groups. The changes follow the resource scheduling rules, and the results match the correlation range of the potential risk factor data items.

[0106] In this embodiment, the resource scheduling rules of the cloud service resource scheduling component include resource allocation strategies, resource migration trigger conditions, and load balancing algorithms. When simulating changes in resource allocation ratios, the affected resource types and ranges are determined based on the correlation range of potential risk trigger data items. Then, the resource allocation ratio among different services or nodes is adjusted according to the resource scheduling rules. The change in the number of resource migrations is related to the adjustment of the resource allocation ratio and the resource load. When the resource allocation ratio changes significantly or the load on some nodes is too high / too low, the number of resource migrations will increase. The change in resource load balancing reflects the evenness of resource distribution among different nodes or services. The change result needs to match the correlation range of potential risk trigger data items; that is, the larger the correlation range, the more widespread the impact on resource load balancing may be.

[0107] Step S1447: Store the data on the changes in the operating status of each component during the simulation in real time, including the specific parameter values, parameter change magnitude, and parameter change direction at each time point, to form a component operating status simulation dataset.

[0108] In this embodiment, during the simulation of the cloud service component's operational status, the operational status parameters of each component are sampled at set time intervals (such as per second, per millisecond, etc.), and the specific parameter values ​​at each time point are recorded, such as the specific value of CPU utilization and the specific number of service calls. Simultaneously, the magnitude of parameter changes between adjacent time points (e.g., CPU utilization increasing from 20% to 30%, a change of 10%) and the direction of change (increasing or decreasing) are calculated. The above data is organized according to time sequence and component type to form a component operational status simulation dataset for subsequent analysis and processing.

[0109] Step S145: During the simulation, record the changes in the operating status of each component of the cloud service in real time, calculate the real-time evaluation value under each evaluation dimension according to the evaluation indicators of the self-evolution evaluation dimension set, and form an evaluation value time series.

[0110] In this embodiment, within the simulated time frame, as the operational status of cloud service components changes, real-time data on the operational status changes of each component is obtained from the component operational status simulation dataset. Then, according to the evaluation metrics and logic of each evaluation dimension in the self-evolutionary evaluation dimension set, the aforementioned data is calculated in real-time to obtain the real-time evaluation value of each evaluation dimension at different time points. These real-time evaluation values ​​are arranged chronologically to form a time series of evaluation values ​​for each evaluation dimension. This time series of evaluation values ​​can intuitively reflect the changing trends of each evaluation dimension within the simulated time frame.

[0111] Step S146: Based on the time series of the assessment values ​​and the data on changes in the operating status of cloud service components, analyze the risk change trend of the cloud service risk status under the influence of different potential risk trigger data groups. The change trend includes whether the risk has expanded, whether the risk has shifted, and whether the risk has been mitigated.

[0112] In this embodiment, risk trends are analyzed by comparing changes in assessment values ​​over time with changes in parameters within the cloud service component operational status data. If the assessment value continues to rise, the abnormality of the cloud service component operational status parameters intensifies, indicating an expanding risk. If the assessment value decreases in one dimension but increases in another, and the abnormality of cloud service component operational status parameters shifts from one component to another, it indicates a risk transfer. If the assessment value gradually decreases, the abnormality of the cloud service component operational status parameters lessens and gradually returns to normal, indicating a mitigating risk. The risk change trends under the influence of different potential risk trigger data groups are analyzed separately to obtain their respective trend characteristics.

[0113] Step S147: Integrate the data on changes in the operating status of cloud service components, the time series of assessment values, and the trend of risk changes within the simulation time range to generate simulation results of cloud service risk evolution that include time dimension, component dimension, and assessment dimension.

[0114] In this embodiment, the data on the changes in the operational status of cloud service components within the simulated time range are organized according to the time dimension and the component dimension, forming a matrix showing the changes in component operational status over time. The time series of evaluation values ​​for each evaluation dimension are also organized according to the time dimension and the evaluation dimension, forming a matrix showing the changes in evaluation values ​​over time. The risk change trend is correlated according to the time dimension and the potential risk trigger data set. Then, these three parts of data are integrated to construct a three-dimensional data structure containing the time dimension (each time node within the simulated time range), the component dimension (each component of the cloud service), and the evaluation dimension (each dimension in the self-evolving evaluation dimension set), serving as the simulation result of cloud service risk evolution. This simulation result comprehensively demonstrates the risk evolution of the cloud service at different times, different components, and different evaluation dimensions under the influence of different potential risk trigger data sets.

[0115] Step S150: Based on the cloud service risk evolution simulation results and the current operating resource status of the cloud service, generate a cloud service intelligent operation and maintenance security risk response strategy that coexists with the risk evolution trend, and send the cloud service intelligent operation and maintenance security risk response strategy to the cloud service operation and maintenance execution system to trigger operation and maintenance operations.

[0116] Step S151: Analyze the cloud service risk evolution simulation results, extract the risk change trend, cloud service component operation status change data, and assessment value time series, and determine the main risk types, risk change directions, and core components affected by the current cloud service.

[0117] In this embodiment, the simulation results of cloud service risk evolution are analyzed to identify the main types of risks currently faced by the cloud service from the risk change trend, such as service performance degradation risk and service unavailability risk; based on the change direction of the assessment value time series and the deterioration or improvement of the cloud service component operation status change data, the direction of risk change is determined, that is, whether the risk is expanding, transferring, or mitigating; from the cloud service component operation status change data, the components most severely affected by the risk and crucial to the overall operation of the cloud service are identified as the core components affected by the risk, such as the node components where the core business service is located, the critical data storage components, etc.

[0118] Step S152: Obtain the current operating resource status data of the cloud service, including the remaining resources of each node, the resource occupancy of each service, and the availability of each scheduling resource. Analyze the current operating resource status to support risk management and determine whether there are sufficient resources to cope with the changing trend of risk.

[0119] In this embodiment, the current operational resource status data of the cloud service is obtained in real time through the cloud service resource management center. The remaining resources of each node include the remaining CPU capacity, memory capacity, disk space, and network bandwidth of each physical server node; the resource usage of each service includes the current CPU, memory, and storage resources used by each application service; and the available scheduling resources include CPU, memory, storage, and network resources available for dynamic scheduling. By analyzing the above data, it is assessed whether the current resources of the cloud service are sufficient to implement risk mitigation measures. For example, is there enough remaining CPU resources to start backup service instances, or enough network bandwidth to isolate affected components? This determines whether the resource support capacity is adequate.

[0120] Step S153: Based on the main risk type, the direction of risk change, the core components of risk impact, and resource support capabilities, query the preset risk response strategy library and select candidate response strategies that match the main risk type and are suitable for resource support capabilities.

[0121] In this embodiment, a pre-defined risk response strategy library stores various risk response strategies for different risk types, risk levels, and resource conditions. Each strategy includes information such as strategy name, applicable risk type, required resource conditions, strategy operation steps, and expected effect. Based on the identified primary risk type, strategies applicable to that type of risk are selected from the strategy library. Then, considering resource support capabilities, strategies that can meet the required resources under the current resource conditions are further selected; these strategies are the candidate response strategies. For example, if the primary risk type is service performance degradation risk and resource support capabilities are sufficient, candidate response strategies such as increasing resource allocation and initiating load balancing are selected.

[0122] Step S154: For each candidate response strategy, based on the cloud service risk evolution simulation results, predict the change in the cloud service risk trend after the candidate response strategy is executed. The changes include whether the risk expansion speed slows down, whether the risk impact scope shrinks, and whether the risk can be controlled within a preset time.

[0123] In this embodiment, the simulation model and data from the cloud service risk evolution simulation results are used to simulate the execution of each candidate response strategy. During the simulation, the operation steps of the candidate response strategy are used as input parameters, and the relevant parameters in the simulation model are adjusted to observe the changes in the cloud service risk evolution simulation results. By comparing the risk change trends before and after executing the candidate response strategy, it is predicted whether the risk expansion rate has slowed down (e.g., whether the rate of increase in the assessment value has decreased), whether the scope of risk impact has narrowed (e.g., whether the number of affected components has decreased), and whether the risk can be controlled within a preset time (e.g., whether the assessment value can drop below the safety threshold within a preset time).

[0124] Step S155: Based on the prediction results, calculate the risk control effect of each candidate response strategy. The risk control effect is comprehensively measured by the degree of risk mitigation, the scope of risk reduction, and the duration of risk control.

[0125] Step S1551: For each candidate response strategy, extract the change data of the risk expansion rate after the implementation of the candidate response strategy from the prediction results, calculate the difference between the risk expansion rate after implementation and the risk expansion rate before implementation, and use the ratio of the difference to the risk expansion rate before implementation as the risk mitigation magnitude.

[0126] In this embodiment, the rate of risk escalation is represented by the slope of the evaluation value time series; the steeper the slope, the faster the risk escalates. The risk escalation rate before the candidate response strategy is executed (i.e., the slope of the evaluation value time series before execution) and the risk escalation rate after execution (i.e., the slope of the evaluation value time series after execution) are obtained from the prediction results. The difference between the two is calculated, and then this difference is divided by the risk escalation rate before execution. The resulting ratio represents the risk mitigation magnitude. A positive ratio indicates a slower rate of risk escalation; the larger the ratio, the more significant the mitigation effect.

[0127] Step S1552: Extract the data on the change in the number of risk-affecting components after the execution of the candidate response strategy from the prediction results, calculate the difference between the number of risk-affecting components after execution and the number of risk-affecting components before execution, and use the ratio of the absolute value of the difference to the number of risk-affecting components before execution as the risk narrowing range.

[0128] In this embodiment, the number of risk-affected components refers to the total number of components in the cloud service components that are affected by risks. The number of risk-affected components before and after execution is obtained from the prediction results. The difference between the two is calculated, and the absolute value of this difference is taken. Then, this difference is divided by the number of risk-affected components before execution; the resulting ratio is the risk reduction range. A positive ratio indicates a reduction in the risk impact range; the larger the ratio, the more significant the reduction effect.

[0129] Step S1553: Extract the time data from the prediction results when the risk parameters reach the preset safety standard after the execution of the candidate response strategy. This time data is used as the risk control duration.

[0130] In this embodiment, the preset security standard refers to the threshold value of the assessment value or the threshold value of the component's operating status parameters when the cloud service risk status reaches a safe and controllable range. The time interval between the first time the cloud service risk parameters (assessed value or component operating status parameters) reach the preset security standard after the candidate response strategy is found from the prediction results and executed, and the start time of the strategy execution, is the risk control duration. The shorter the risk control duration, the higher the risk control efficiency of the strategy.

[0131] Step S1554: Standardize the risk mitigation magnitude, risk reduction range, and risk control duration, converting them into uniform scoring values. The risk mitigation magnitude and risk reduction range are directly used as scoring values, while the risk control duration is converted into a scoring value through a preset benchmark control duration. The shorter the risk control duration, the higher the scoring value.

[0132] In this embodiment, the purpose of standardization is to convert risk control effectiveness indicators of different dimensions and ranges into a unified scoring value for comprehensive comparison. Risk mitigation magnitude and risk reduction range are ratios, ranging from 0 to 1, and can be directly used as scoring values ​​(0 indicates no effect, 1 indicates optimal effect). For risk control duration, a baseline control duration is preset, which is the expected control duration set based on historical experience and safety requirements. The risk control duration is compared with the baseline control duration. If the risk control duration is less than or equal to the baseline control duration, the scoring value is 1; if the risk control duration is longer than the baseline control duration, points are deducted according to the excess ratio. The larger the excess ratio, the lower the scoring value. The specific conversion formula can be set according to the actual situation, such as scoring value = 1 - (risk control duration - baseline control duration) / baseline control duration (when risk control duration > baseline control duration).

[0133] Step S1555: Set the weight coefficients for the risk mitigation score, the risk reduction score, and the risk control duration score. The weight coefficients are determined based on the core needs of the current cloud service operation. If the core need is to quickly control risks, the weight coefficient of the risk control duration score is set to be higher than the values ​​of other indicators; if the core need is to reduce the impact of risks, the weight coefficient of the risk reduction score is set to be higher than the values ​​of other indicators.

[0134] In this embodiment, the weighting coefficients range from 0 to 1, and the sum of the weighting coefficients of the three indicators is 1. The weighting coefficients are set according to the core needs of the cloud service's current operation. For example, when the cloud service is in peak business period, the core need is to quickly control risks to avoid business interruption; in this case, the weighting coefficient for the risk control duration score can be set to 0.5, and the weighting coefficients for the risk mitigation magnitude and risk reduction scope scores can be set to 0.3 and 0.2, respectively. When the core data security of the cloud service is paramount, the core need is to reduce the scope of risk impact to protect core data; in this case, the weighting coefficient for the risk reduction scope score can be set to 0.5, and the weighting coefficients for the risk mitigation magnitude and risk control duration scores can be set to 0.3 and 0.2, respectively.

[0135] Step S1556: Multiply the risk mitigation score, risk reduction range score, and risk control duration score by their respective weighting coefficients to obtain the weighted score of each indicator. Sum the weighted scores of each indicator to obtain the total risk control effect score of the candidate response strategy. The higher the total risk control effect score, the better the risk control effect.

[0136] In this embodiment, for each candidate response strategy, its risk mitigation score is multiplied by a risk mitigation weighting coefficient to obtain a risk mitigation weighted score. Similarly, a risk reduction range weighted score and a risk control duration weighted score are calculated. Then, these three weighted scores are summed to obtain the total risk control effect score for the candidate response strategy. By comparing the total risk control effect scores of different candidate response strategies, the strategy with the highest total score is the strategy with the best risk control effect.

[0137] Step S1557: Record the risk mitigation extent, risk reduction range, risk control duration, corresponding score, weight coefficient, weighted score and total score for each candidate response strategy to form a risk control effect evaluation table.

[0138] In this embodiment, the various risk control effectiveness indicators (risk mitigation magnitude, risk reduction range, risk control duration), corresponding standardized scores, set weighting coefficients, calculated weighted scores, and final total risk control effectiveness scores of each candidate response strategy are organized and recorded according to a preset table format to form a risk control effectiveness evaluation table. This risk control effectiveness evaluation table can clearly demonstrate the risk control effectiveness of each candidate response strategy.

[0139] Step S156: Select the candidate response strategy with the best risk control effect, and adjust the specific operation parameters in the candidate response strategy, including the amount of resources invested, the order of operation execution, and the operation execution time window, in combination with the current operating resource status of the cloud service, so that the adjusted candidate response strategy is fully adapted to the current operating status of the cloud service.

[0140] In this embodiment, based on the risk control effectiveness evaluation table, the candidate response strategy with the highest total risk control effectiveness score is selected as the initially selected response strategy. Then, combined with the current operating resource status data of the cloud service, the specific operational parameters in this strategy are adjusted. The amount of resources invested is adjusted according to the remaining resources of each node and the available resources of each scheduling, ensuring that the invested resources can meet the needs under the current resource status; the operation execution order is optimized according to the core components of the risk impact and the risk change trend, prioritizing the execution of the most critical operations that mitigate the speed of risk expansion and reduce the scope of risk impact; the operation execution time window is selected according to the current business load of the cloud service, avoiding the execution of operations that may affect service availability during peak business periods, and trying to select operations during off-peak business periods.

[0141] Step S157: Decompose the adjusted response strategy into specific operation and maintenance steps, and integrate the operation steps to form a complete cloud service intelligent operation and maintenance security risk response strategy. Each operation and maintenance step is used to determine the operation object, operation content, operation standard, and operation responsible person.

[0142] In this embodiment, the adjusted response strategy is broken down into a series of specific operation and maintenance steps according to the order and logical relationship of the operations. Each operation and maintenance step clearly defines the operation object (such as specific cloud service nodes, service instances, resource scheduling strategies, etc.), operation content (such as increasing CPU resource allocation, restarting service instances, updating security configurations, etc.), operation standards (such as how much CPU resource allocation should be increased, and what response time should be within a certain range after service restart, etc.), and the person responsible for the operation (such as system administrators, operation and maintenance engineers, etc.). The above operation and maintenance steps are integrated in sequence to form a complete cloud service intelligent operation and maintenance security risk response strategy document.

[0143] Step S158: Send the cloud service intelligent operation and maintenance security risk response strategy to the cloud service operation and maintenance execution system to trigger operation and maintenance operations.

[0144] In this embodiment, a complete cloud service intelligent operation and maintenance security risk response strategy document is sent to the cloud service operation and maintenance execution system through a preset interface or communication protocol. After receiving the strategy, the cloud service operation and maintenance execution system automatically or semi-automatically triggers corresponding operation and maintenance operations according to the operation and maintenance steps in the strategy, such as sending resource adjustment instructions to the resource management center or service restart instructions to the service management system, so as to achieve timely handling of cloud service security risks.

[0145] Based on the same inventive concept, please refer to Figure 2This document shows a schematic block diagram of an AI-driven cloud service intelligent operation and maintenance security risk dynamic assessment system 100, provided in an embodiment of this application, for executing the above-described AI-driven cloud service intelligent operation and maintenance security risk dynamic assessment method. The AI-driven cloud service intelligent operation and maintenance security risk dynamic assessment system 100 may include a communication unit 110, a machine-readable storage medium 120, and a processor 130.

[0146] In this embodiment, both the machine-readable storage medium 120 and the processor 130 are located in the AI-driven cloud service intelligent operation and maintenance security risk dynamic assessment system 100 and are separately configured. Alternatively, the machine-readable storage medium 120 can also be integrated into the processor 130 and can communicate and interact with external systems through the communication unit 110. The machine-readable storage medium 120 is used to store machine-executable instructions for executing the scheme of this application, and the processor 130 is used to execute the machine-executable instructions stored in the machine-readable storage medium 120 to implement the AI-driven cloud service intelligent operation and maintenance security risk dynamic assessment method provided in the aforementioned method embodiments.

[0147] It should be noted that, in order to simplify the description of the present invention and thus help to understand one or more embodiments of the invention, multiple features may sometimes be grouped into one embodiment, drawing or description thereof in the foregoing description of the embodiments of the present invention.

Claims

1. A dynamic assessment method for security risks in intelligent operation and maintenance of cloud services based on artificial intelligence, characterized in that, The method includes: Real-time capture of all operation and maintenance data during cloud service operation, forming a cloud service full operation and maintenance data set, which includes cloud service node operation data, service interaction data, external access request data, resource scheduling data, and risk warning data; The full set of cloud service operation and maintenance data is input into the risk cause tracing model to explore the cause correlation between different operation and maintenance data types and historical security risk events, and generate cloud service intelligent operation and maintenance risk cause tracing results. Based on the source tracing results of the cloud service intelligent operation and maintenance risk causes, the security risk assessment dimensions are driven to self-evolve and adjust, generating a set of self-evolving assessment dimensions that are adapted to the current cloud service risk cause characteristics. The set of self-evolving assessment dimensions includes the risk cause impact breadth dimension, the risk cause latency duration dimension, and the risk cause elimination difficulty dimension. The full set of cloud service operation and maintenance data and the source tracing results of cloud service intelligent operation and maintenance risk factors are input into the set of self-evolution assessment dimensions to simulate the changes in the operating status of cloud services under the influence of different risk factors, and obtain the simulation results of cloud service risk evolution. Based on the cloud service risk evolution simulation results and the current operating resource status of the cloud service, a cloud service intelligent operation and maintenance security risk response strategy that coexists with the risk evolution trend is generated. The cloud service intelligent operation and maintenance security risk response strategy is sent to the cloud service operation and maintenance execution system to trigger operation and maintenance operations.

2. The method for dynamic assessment of security risks in intelligent cloud service operation and maintenance based on artificial intelligence as described in claim 1, characterized in that, The process involves inputting the full set of cloud service operation and maintenance data into the risk cause tracing model to mine the causal correlation between different types of operation and maintenance data and historical security risk events, generating intelligent operation and maintenance risk cause tracing results for cloud services, including: The full set of cloud service operation and maintenance data is divided into cloud service node operation data subset, service interaction data subset, external access request data subset, resource scheduling data subset, and risk trigger warning data subset according to data type. Each data subset is labeled with the corresponding data collection time information. Acquire historical security risk event data, break down the historical security risk event data into events, and extract the operation and maintenance data before the event occurred, the operation and maintenance data when the event occurred, and the operation and maintenance data after the event was handled for each historical security risk event to form a historical event operation and maintenance data sequence. The subsets of the full cloud service operation and maintenance data set and the historical event operation and maintenance data sequence are input into the risk cause tracing model, and the feature overlap between each data subset and the operation and maintenance data before the event in the historical event operation and maintenance data sequence is compared. For a subset of data whose feature overlap exceeds a preset overlap threshold, further extract data items in the subset that are consistent with the operation and maintenance data features before the historical event occurred, and mark the data items as potential risk trigger data items. Based on the collection time information of potential risk factor data items, the order of occurrence of potential risk factor data items during cloud service operation is sorted out, and the time series relationship of potential risk factor data items is constructed. By combining the correspondence between event causes and event consequences in historical security risk event data, the corresponding event impact type is labeled for each potential risk cause data item. The time series relationship and event impact type are integrated to generate cloud service intelligent operation and maintenance risk cause tracing results that include potential risk cause data items, time series relationships, and event impact types.

3. The method for dynamic assessment of security risks in intelligent operation and maintenance of cloud services based on artificial intelligence as described in claim 2, characterized in that, The process involves inputting each subset of the divided full cloud service operation and maintenance data set and the historical event operation and maintenance data sequence into the risk cause tracing model, and comparing the feature overlap between each data subset and the operation and maintenance data prior to the event in the historical event operation and maintenance data sequence, including: Data features are extracted from various subsets of the full cloud service operation and maintenance data set. Specifically, CPU utilization, memory usage, and disk I / O changes are extracted from the cloud service node runtime data subset; service call frequency, service response time, and service error code occurrences are extracted from the service interaction data subset; access IP distribution, access request type distribution, and access frequency changes are extracted from the external access request data subset; resource allocation ratio, resource migration frequency, and resource load changes are extracted from the resource scheduling data subset; and abnormal data occurrence frequency, data value fluctuation amplitude, and data correlation change characteristics are extracted from the risk precipitant data subset. The same data feature extraction is performed on the operation and maintenance data before the event in the historical event operation and maintenance data sequence to obtain the feature set of the historical operation and maintenance data before the event; After standardizing the feature dimensions and representations of each subset of the full cloud service operation and maintenance data set with the feature set of operation and maintenance data from different sources before the occurrence of historical events, the matching degree between the features of each subset of the full cloud service operation and maintenance data set after standardization and transformation of feature dimensions and each feature in the feature set of operation and maintenance data before the occurrence of historical events is calculated. The matching degree is obtained by comprehensively considering the consistency of feature value change trends, the consistency of feature occurrence time intervals, and the consistency of feature correlation relationships. For each data subset, count the number of features in the data subset whose matching degree with the operation and maintenance data before the historical event exceeds a preset matching threshold, and calculate the ratio of the number of matching features to the total number of features in the data subset. This ratio is used as the feature overlap between each data subset and the operation and maintenance data before the historical event. Record the feature overlap calculation process for each data subset, including the basis for feature extraction, feature alignment method, matching degree calculation logic, and feature overlap results, forming a feature overlap calculation record.

4. The method for dynamic assessment of security risks in intelligent operation and maintenance of cloud services based on artificial intelligence as described in claim 1, characterized in that, The method, based on the cloud service intelligent operation and maintenance risk cause tracing results, drives the security risk assessment dimensions to self-evolve and adjust, generating a set of self-evolving assessment dimensions adapted to the current cloud service risk cause characteristics, including: Obtain the initial security risk assessment dimension framework, which includes the risk impact scope dimension, risk occurrence probability dimension, and risk handling cost dimension. Each dimension has corresponding assessment indicators. Analyze the risk causation results of cloud service intelligent operation and maintenance, extract potential risk causation data items and event impact types, and count the number of potential risk causation data items corresponding to different event impact types; Based on the number of potential risk trigger data items corresponding to different event impact types, determine the main risk impact types in the current cloud service operation process, associate the main risk impact types with the dimensions in the initial security risk assessment dimension framework, and determine whether the initial dimensions can cover the main risk impact types. If the initial dimensions cannot cover the main risk impact types, new assessment dimensions corresponding to the uncovered risk impact types will be added. The names of the new dimensions will match the uncovered risk impact types, and the assessment indicators under the new dimensions will be set according to the characteristics of the potential risk trigger data items. If the initial dimension can cover the main risk impact types, then adjust the weight of the assessment indicators under the initial dimension. Based on the proportion of the number of potential risk trigger data items corresponding to the main risk impact types, increase the weight of the assessment indicators corresponding to the risk impact types whose proportion of the number of data items is greater than the set proportion threshold, and decrease the weight of the assessment indicators corresponding to the risk impact types whose proportion of the number of data items is not greater than the set proportion threshold. The newly added dimensions or the initial dimensions with adjusted weights are integrated with the corresponding evaluation indicators to form a preliminary set of self-evolving evaluation dimensions. The preliminary self-evolution assessment dimension set is applied to the latest cloud service operation and maintenance data to calculate the assessment accuracy of the preliminary self-evolution assessment dimension set for the latest partial range of risk events. If the assessment accuracy does not reach the preset accuracy standard, the dimensions or assessment indicator weights are readjusted until the assessment accuracy reaches the preset standard, and the final self-evolution assessment dimension set is generated.

5. The method for dynamic assessment of security risks in intelligent operation and maintenance of cloud services based on artificial intelligence as described in claim 4, characterized in that, If the initial dimensions cannot cover the main risk impact types, then new assessment dimensions corresponding to the uncovered risk impact types will be added, including: Analyze the types of risk impacts that are not covered, determine the specific ways in which these risk impacts affect the operation of cloud services, their target objects, and their consequences, and determine the names of the new assessment dimensions based on the analysis results; Extract potential risk factor data items corresponding to the uncovered risk impact type from the cloud service intelligent operation and maintenance risk factor tracing results, and analyze the core characteristics of the potential risk factor data items, including the collection frequency of the potential risk factor data items, the range of change of the potential risk factor data items, the correlation between the potential risk factor data items and other operation and maintenance data, and the impact path of the potential risk factor data items on cloud service components; Based on the core features of potential risk trigger data items, assessment indicators are constructed under the new assessment dimension. Each assessment indicator corresponds to a core feature of the potential risk trigger data item, and the expression form of the assessment indicators is consistent with the expression form of the assessment indicators under the initial dimension. The evaluation logic for each new evaluation indicator is determined. The evaluation logic specifies the method for calculating the evaluation indicator result using the specific data of the potential risk trigger data item. The evaluation logic is constructed in conjunction with the way the risk impact type affects the cloud service. The new assessment dimensions, new assessment indicators and corresponding assessment logic are integrated to form the new dimension module, and the new dimension module is integrated into the initial security risk assessment dimension framework. Test the new dimension module by inputting historical security risk event data related to the uncovered risk impact type into the new dimension module to verify whether the new assessment indicators and assessment logic can accurately reflect the actual situation of the risk impact type. If the verification fails, adjust the assessment indicators or assessment logic until the verification passes.

6. The method for dynamic assessment of security risks in intelligent operation and maintenance of cloud services based on artificial intelligence as described in claim 1, characterized in that, The process involves inputting the full set of cloud service operation and maintenance data and the cloud service intelligent operation and maintenance risk causal tracing results into the self-evolutionary assessment dimension set to simulate the changes in the operational status of the cloud service under different risk causal effects, thereby obtaining the cloud service risk evolution simulation results, including: Extract potential risk trigger data items and corresponding event impact types from the cloud service intelligent operation and maintenance risk trigger source tracing results. Group the potential risk trigger data items according to the event impact type to obtain potential risk trigger data groups for different event impact types. The operation and maintenance data related to each potential risk factor data group are filtered out from the full set of cloud service operation and maintenance data to form the associated operation and maintenance data subsets corresponding to each potential risk factor data group. Each potential risk factor data group and its corresponding associated operation and maintenance data subset are input into the self-evolutionary evaluation dimension set. According to the evaluation indicators and evaluation logic of each dimension in the self-evolutionary evaluation dimension set, the initial evaluation value of each potential risk factor data group under each evaluation dimension is calculated. Using the initial assessment value as the initial condition for simulation, a simulation time range is set, and the changes in the operating status of each cloud service component are simulated under the continuous effect of the potential risk factor data group within the time range. The resulting cloud service component operating status change data includes changes in node operating parameters, changes in service interaction parameters, and changes in resource scheduling parameters. During the simulation, the changes in the operating status of each component of the cloud service are recorded in real time. According to the evaluation indicators of the self-evolution evaluation dimension set, the real-time evaluation values ​​under each evaluation dimension are calculated to form a time series of evaluation values. Based on the time series of assessment values ​​and the data on changes in the operating status of cloud service components, the risk change trend of cloud service risk status under the influence of different potential risk trigger data groups is analyzed. The change trend includes whether the risk expands, whether the risk is transferred, and whether the risk is mitigated. By integrating data on changes in the operational status of cloud service components, time series of assessment values, and trends in risk changes within the simulation timeframe, a simulation result of cloud service risk evolution is generated, encompassing time, component, and assessment dimensions.

7. The method for dynamic assessment of security risks in intelligent operation and maintenance of cloud services based on artificial intelligence as described in claim 6, characterized in that, The simulation uses the initial evaluation value as the initial condition, sets a simulation time range, and simulates the changes in the operating status of various cloud service components under the continuous influence of the potential risk factor data set within this time range. These changes include variations in node operating parameters, service interaction parameters, and resource scheduling parameters. Set a corresponding baseline value for the operating status of cloud service components for the initial evaluation value of each evaluation dimension, so that the baseline value can accurately reflect the actual operating status corresponding to the initial evaluation value; Set a simulation time range. The simulation time range is set in combination with the regular cycle of cloud service operation and maintenance and the impact cycle of potential risk trigger data items, so that the simulation time range can cover the key stages in which the risks may change. The characteristic parameters of the potential risk factor data set are used as simulation driving parameters to drive the simulation of the cloud service component's operating status. The characteristic parameters include the data item change frequency, the data item influence intensity, and the data item association range. For cloud service node components, the changes in CPU utilization, memory usage, and disk I / O rate under the influence of potential risk factor data groups were simulated. The changes were consistent with the operating rules of this type of cloud service node component, and the magnitude of the changes was positively correlated with the influence intensity of the potential risk factor data items. For cloud service interaction components, the changes in service call frequency, service response time, and service error code occurrences under the influence of potential risk trigger data groups are simulated. The changes are combined with the service interaction logic, and the trend of changes is consistent with the change frequency of potential risk trigger data items. For cloud service resource scheduling components, the changes in resource allocation ratio, number of resource migrations, and resource load balancing under the influence of potential risk factor data groups are simulated. The changes follow resource scheduling rules, and the results match the correlation range of potential risk factor data items. The simulation stores data on the changes in the operating status of each component in real time, including the specific parameter values, parameter change magnitudes, and parameter change directions at each time point, forming a simulation dataset of the component operating status.

8. The method for dynamic assessment of security risks in intelligent operation and maintenance of cloud services based on artificial intelligence as described in claim 1, characterized in that, The process involves generating a cloud service intelligent operation and maintenance security risk response strategy that coexists with the risk evolution trend based on the cloud service risk evolution simulation results and the current operating resource status of the cloud service, including: Analyze the simulation results of cloud service risk evolution, extract the risk change trend, cloud service component operation status change data, and assessment value time series, and determine the main risk types, risk change directions, and core components affected by the current cloud service; Obtain data on the current operating resource status of cloud services, including the remaining resources of each node, the resource usage of each service, and the availability of each scheduling resource. Analyze the current operating resource status to assess its ability to support risk management and determine whether there are sufficient resources to cope with changing risk trends. Based on the main risk types, the direction of risk changes, the core components of risk impact, and resource support capabilities, query the preset risk response strategy library and filter out candidate response strategies that match the main risk types and are suitable for resource support capabilities. For each candidate response strategy, based on the cloud service risk evolution simulation results, predict the changes in the cloud service risk trend after the execution of the candidate response strategy. The changes include whether the risk expansion speed slows down, whether the risk impact scope shrinks, and whether the risk can be controlled within a preset time. Based on the prediction results, the risk control effect of each candidate response strategy is calculated. The risk control effect is comprehensively measured by the degree of risk mitigation, the scope of risk reduction, and the duration of risk control. Select the candidate response strategy with the best risk control effect, and adjust the specific operation parameters of the candidate response strategy, including the amount of resources invested, the order of operation execution, and the time window of operation execution, in combination with the current operating resource status of the cloud service, so that the adjusted candidate response strategy is fully adapted to the current operating status of the cloud service. The adjusted response strategy is broken down into specific operation and maintenance steps, and these steps are integrated to form a complete cloud service intelligent operation and maintenance security risk response strategy. Each operation and maintenance step is used to determine the operation object, operation content, operation standard, and operation responsible person.

9. The method for dynamic assessment of security risks in intelligent operation and maintenance of cloud services based on artificial intelligence as described in claim 8, characterized in that, Based on the prediction results, the risk control effect of each candidate response strategy is calculated. The risk control effect is comprehensively measured by the degree of risk mitigation, the scope of risk reduction, and the duration of risk control, including: For each candidate response strategy, extract the change data of the risk escalation rate after the implementation of the candidate response strategy from the prediction results, calculate the difference between the risk escalation rate after implementation and the risk escalation rate before implementation, and use the ratio of the difference to the risk escalation rate before implementation as the risk mitigation magnitude; Extract the data on the change in the number of risk-affecting components after the execution of the candidate response strategy from the prediction results, calculate the difference between the number of risk-affecting components after execution and the number of risk-affecting components before execution, and use the ratio of the absolute value of the difference to the number of risk-affecting components before execution as the risk narrowing range. Extract the time data from the prediction results when the risk parameters reach the preset safety standard after the execution of the candidate response strategy, and use this time data as the risk control duration. The risk mitigation magnitude, risk reduction scope, and risk control duration are standardized and converted into uniform scoring values. The risk mitigation magnitude and risk reduction scope are directly used as scoring values, while the risk control duration is converted into a scoring value through a preset benchmark control duration. The shorter the risk control duration, the higher the scoring value. Set weight coefficients for the risk mitigation score, the risk reduction score, and the risk control duration score. The weight coefficients are determined based on the core needs of the current cloud service operation. If the core need is to quickly control risks, the weight coefficient for the risk control duration score is set higher than that of other indicators. If the core need is to reduce the impact of risks, the weight coefficient for the risk reduction score is set higher than that of other indicators. The risk mitigation score, risk reduction range score, and risk control duration score are multiplied by their respective weighting coefficients to obtain the weighted score for each indicator. The weighted scores of each indicator are then summed to obtain the total risk control effect score for the candidate response strategy. A higher total risk control effect score indicates a better risk control effect. Record the risk mitigation extent, risk reduction range, risk control duration, corresponding score, weight coefficient, weighted score, and total score for each candidate response strategy to form a risk control effectiveness evaluation table.

10. A cloud service intelligent operation and maintenance security risk dynamic assessment system based on artificial intelligence, characterized in that, include: processor; A machine-readable storage medium for storing machine-executable instructions of the processor; The processor is configured to execute the AI-driven cloud service intelligent operation and maintenance security risk dynamic assessment method according to any one of claims 1 to 9 by executing the machine-executable instructions.