A cloud platform risk monitoring method and system

By acquiring multiple attributes and business response times of the cloud platform, and comprehensively analyzing the platform's operational status and risk values, the problem of low monitoring accuracy in existing technologies is solved, enabling more accurate risk prediction and anomaly identification.

CN122220192APending Publication Date: 2026-06-16SHANXI DECHANGHONG INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANXI DECHANGHONG INFORMATION TECHNOLOGY CO LTD
Filing Date
2026-03-24
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing cloud platform risk monitoring methods fail to accurately consider actual business processes, resulting in low monitoring accuracy.

Method used

By acquiring multiple attributes of the cloud platform, such as cloud disk utilization, CPU utilization, memory utilization, and number of connections, and combining them with the business response time within a preset time period, the operating status and risk value of the cloud platform are comprehensively analyzed.

Benefits of technology

It improves the accuracy of cloud platform risk monitoring, enabling earlier identification of potential anomalies and crash risks, and reducing business processing delays and interruptions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122220192A_ABST
    Figure CN122220192A_ABST
Patent Text Reader

Abstract

The application relates to a cloud platform risk monitoring method and system, and relates to the field of cloud platforms.The method comprises the following steps: acquiring a plurality of attributes of a cloud platform, the plurality of attributes comprising cloud disk usage, CPU usage, memory usage and the number of connections of a cloud server; determining a running state value of the cloud platform based on the cloud disk usage, the CPU usage, the memory usage and the number of connections; acquiring the response time of each completed service within a preset time period; and determining a risk value of the cloud platform based on the running state value and the response time of each completed service.The application has the effect of improving the accuracy of cloud platform risk monitoring.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of cloud platforms, and in particular to a cloud platform risk monitoring method and system. Background Technology

[0002] A cloud platform is a virtualized platform that provides users with IT resources and services such as computing, storage, networking, and software based on cloud computing technology. It integrates and allocates physical hardware resources on demand through the Internet. Users do not need to build and maintain local infrastructure themselves. They can conveniently call resources, deploy applications, carry out data processing or perform other customized tasks through terminals. It is widely used in scenarios such as enterprise digital transformation, scientific research computing, and Internet application development.

[0003] When cloud platform resource utilization is too high or other anomalies occur, it can lead to slower business processing response or even platform crashes. Currently, risk monitoring is only conducted by monitoring the various attributes of the cloud platform's own servers. However, this monitoring method does not take into account the actual business processing conditions, resulting in low accuracy in risk monitoring. Summary of the Invention

[0004] To improve the accuracy of cloud platform risk monitoring, this application provides a cloud platform risk monitoring method and system.

[0005] Firstly, this application provides a cloud platform risk monitoring method, which adopts the following technical solution: A cloud platform risk monitoring method includes: Obtain multiple attributes of the cloud platform, including cloud disk utilization, cloud server CPU utilization, memory utilization, and number of connections; The operating status value of the cloud platform is determined based on the cloud disk utilization, CPU utilization, memory utilization, and number of connections. Get the response time of each completed business within a preset time period; The risk value of the cloud platform is determined based on the operational status value and the response time of each completed service.

[0006] By adopting the above technical solutions, multiple attributes such as cloud platform utilization and cloud server CPU utilization can be obtained, making it easier to understand the specific parameters and operating status of the cloud platform at present. This facilitates subsequent risk analysis and judgment. Based on these multiple attributes, the operating status of the cloud platform can be quantified, that is, the current operating status value of the cloud platform can be accurately determined. The response time of each completed business over a period of time can be obtained. The response time represents the speed and quality of the cloud platform in processing each business. The slower the speed and the lower the quality, the greater the possibility of subsequent abnormalities and crashes of the cloud platform. By combining the business response time with the current operating status value of the cloud platform for comprehensive analysis, the magnitude of the risk of subsequent abnormalities and crashes of the cloud platform can be more accurately determined. Compared with simple analysis based solely on the magnitude of various attributes of the cloud platform, monitoring and judging the subsequent operating risks of the cloud platform in conjunction with the specific response time when the cloud platform processes business is more accurate.

[0007] In another possible implementation, determining the cloud platform's operating status value based on the cloud disk utilization, CPU utilization, memory utilization, and number of connections includes: Filter the cloud disk utilization, CPU utilization, memory utilization, and number of connections to select the first target attribute that reaches the corresponding upper limit threshold and the second target attribute that does not reach the corresponding upper limit threshold; Determine the number of the first target attributes, and determine the difference between each second target attribute and its corresponding upper limit threshold; The first state value is determined based on the difference of each second target attribute and its corresponding first weight; The operating status value of the cloud platform is determined based on the number of the first target attributes and the first status value.

[0008] In another possible implementation, determining the cloud platform's operating status value based on the quantity of the first target attribute and the first status value includes: Determine the ratio of the number of the first target attribute to the total number of all attributes; The second state value of all first target attributes is determined based on the quantity ratio and the first weight of all first target attributes. The difference between the first state value and the second state value is used to obtain the operating state value of the cloud platform.

[0009] In another possible implementation, determining the second state value of all first target attributes based on the quantity ratio and the first weight of all first target attributes includes: Determine the sum of the first weights of all first target attributes; The second state value is obtained by substituting the first weight sum and the quantity ratio into the first preset relationship function.

[0010] In another possible implementation, determining the risk value of the cloud platform based on the operational status value and the response time of each completed service includes: Based on the response time, a first volume of traffic whose response time reaches a preset time threshold is determined, and the maximum response time in the first volume of traffic is determined. Determine a first ratio between the first volume of business and the total volume of business within the preset time period; A histogram is generated based on the response time of the first traffic volume and the preset grouping interval, and the target interval with the most first traffic volume in the histogram is determined. The median response time is determined based on the upper and lower limits of the target interval, and the time difference between the median response time and the preset time threshold is determined. The risk value of the cloud platform is determined based on the operating status value, maximum response time, first ratio, and time difference.

[0011] In another possible implementation, determining the risk value of the cloud platform based on the operating status value, maximum response time, first ratio, and time difference includes: The anomaly level of the cloud platform is determined based on the maximum response time, the first ratio, the time difference, and their respective second weights. The risk value of the cloud platform is obtained by substituting the operating status value and the abnormality value into the second preset relational function.

[0012] In another possible implementation, the method further includes: If the risk value reaches the preset risk threshold, a prompt message will be output.

[0013] Secondly, this application provides a cloud platform risk monitoring system, which adopts the following technical solution: A cloud platform risk monitoring system includes: The attribute acquisition module is used to acquire multiple attributes of the cloud platform, including cloud disk utilization, cloud server CPU utilization, memory utilization, and number of connections. The status determination module is used to determine the operating status value of the cloud platform based on the cloud disk utilization rate, CPU utilization rate, memory utilization rate, and number of connections. The time acquisition module is used to acquire the response time of each completed business within a preset time period; The risk assessment module is used to determine the risk value of the cloud platform based on the operating status value and the response time of each completed business.

[0014] By adopting the above technical solution, the attribute acquisition module obtains multiple attributes of the cloud platform, such as cloud platform utilization and cloud server CPU utilization, which facilitates understanding the specific parameters and operating status of the cloud platform at present. This, in turn, facilitates subsequent risk analysis and judgment. Based on the above multiple attributes, it is easy to quantify the operating status of the cloud platform. That is, the status determination module accurately determines the current operating status value of the cloud platform based on the above multiple attributes. The time acquisition module obtains the response time of each completed business over a period of time. The response time represents the speed and quality of the cloud platform in processing each business. The slower the speed and the lower the quality, the greater the possibility of subsequent abnormalities and crashes of the cloud platform. The risk determination module combines the response time of the business with the current operating status value of the cloud platform for comprehensive analysis, thereby more accurately determining the magnitude of the risk of subsequent abnormalities and crashes of the cloud platform. Compared with simple analysis based solely on the magnitude of various attributes of the cloud platform, combining the specific response time when the cloud platform processes business is a more accurate way to monitor and judge the subsequent operating risks of the cloud platform.

[0015] In another possible implementation, when the status determination module determines the operating status value of the cloud platform based on the cloud disk utilization, CPU utilization, memory utilization, and number of connections, it is specifically used for: Filter the cloud disk utilization, CPU utilization, memory utilization, and number of connections to select the first target attribute that reaches the corresponding upper limit threshold and the second target attribute that does not reach the corresponding upper limit threshold; Determine the number of the first target attributes, and determine the difference between each second target attribute and its corresponding upper limit threshold; The first state value is determined based on the difference of each second target attribute and its corresponding first weight; The operating status value of the cloud platform is determined based on the number of the first target attributes and the first status value.

[0016] In another possible implementation, when the state determination module determines the operating state value of the cloud platform based on the number of the first target attributes and the first state value, it is specifically used for: Determine the ratio of the number of the first target attribute to the total number of all attributes; The second state value of all first target attributes is determined based on the quantity ratio and the first weight of all first target attributes. The difference between the first state value and the second state value is used to obtain the operating state value of the cloud platform.

[0017] In another possible implementation, when the state determination module determines the second state value of all first target attributes based on the quantity ratio and the first weight of all first target attributes, it is specifically used for: Determine the sum of the first weights of all first target attributes; The second state value is obtained by substituting the first weight sum and the quantity ratio into the first preset relationship function.

[0018] In another possible implementation, when determining the risk value of the cloud platform based on the operating status value and the response time of each completed service, the risk determination module is specifically used for: Based on the response time, a first volume of traffic whose response time reaches a preset time threshold is determined, and the maximum response time in the first volume of traffic is determined. Determine a first ratio between the first volume of business and the total volume of business within the preset time period; A histogram is generated based on the response time of the first traffic volume and the preset grouping interval, and the target interval with the most first traffic volume in the histogram is determined. The median response time is determined based on the upper and lower limits of the target interval, and the time difference between the median response time and the preset time threshold is determined. The risk value of the cloud platform is determined based on the operating status value, maximum response time, first ratio, and time difference.

[0019] In another possible implementation, when determining the risk value of the cloud platform based on the operating status value, maximum response time, first ratio, and time difference, the risk determination module is specifically used for: The anomaly level of the cloud platform is determined based on the maximum response time, the first ratio, the time difference, and their respective second weights. The risk value of the cloud platform is obtained by substituting the operating status value and the abnormality value into the second preset relational function.

[0020] In another possible implementation, the cloud platform risk monitoring system further includes: The output module is used to output a prompt message when the risk value reaches a preset risk threshold.

[0021] Thirdly, this application provides an electronic device that adopts the following technical solution: An electronic device comprising: At least one processor; Memory; At least one application, wherein the application is stored in memory and configured to be executed by at least one processor, wherein at least one configuration is for: executing a cloud platform risk monitoring method as shown in any possible implementation of the first aspect.

[0022] Fourthly, this application provides a computer-readable storage medium, which adopts the following technical solution: A computer-readable storage medium that, when the computer program is executed in a computer, causes the computer to perform a cloud platform risk monitoring method as described in any of the first aspects.

[0023] In summary, this application includes at least one of the following beneficial technical effects: Obtaining multiple attributes of the cloud platform, such as cloud platform utilization and cloud server CPU utilization, facilitates understanding the specific parameters and operational status of the cloud platform, thereby aiding in subsequent risk analysis and assessment. These attributes also allow for the quantification of the cloud platform's operational status, accurately determining its current state. Furthermore, obtaining the response time of each completed transaction over a past period reveals the speed and quality of the cloud platform's processing of each transaction; slower speeds and lower quality indicate a higher likelihood of future anomalies and crashes. Combining the transaction response time with the current operational status of the cloud platform provides a more accurate assessment of the risk of future anomalies and crashes. Compared to simply analyzing the magnitude of various cloud platform attributes, combining the specific response time during transaction processing provides a more accurate method for monitoring and assessing future operational risks. Attached Figure Description

[0024] Figure 1 This is a flowchart illustrating a cloud platform risk monitoring method according to an embodiment of this application.

[0025] Figure 2 This is a schematic diagram of the structure of a cloud platform risk monitoring system according to an embodiment of this application.

[0026] Figure 3 This is a schematic diagram of the structure of an electronic device according to an embodiment of this application. Detailed Implementation

[0027] The present application will be further described in detail below with reference to the accompanying drawings.

[0028] After reading this specification, those skilled in the art may make modifications to this embodiment without contributing any inventive step, but such modifications are protected by patent law as long as they fall within the scope of the claims of this application.

[0029] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0030] Furthermore, the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this article, unless otherwise specified, generally indicates that the preceding and following related objects have an "or" relationship.

[0031] The embodiments of this application will now be described in further detail with reference to the accompanying drawings.

[0032] This application provides a cloud platform risk monitoring method, executed by an electronic device, which can be a server or a terminal device. The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing cloud computing services. The terminal device can be a smartphone, tablet, laptop, desktop computer, etc., but is not limited to these. The terminal device and the server can be directly or indirectly connected via wired or wireless communication. This application does not impose any limitations on this. Figure 1 As shown, the method includes steps S101, S102, S103, and S104, wherein, S101 retrieves multiple attributes of the cloud platform.

[0033] Several attributes include cloud disk utilization, cloud server CPU utilization, memory utilization, and number of connections.

[0034] In the embodiments of this application, the cloud platform mainly includes three aspects: computing resources, storage resources, and network resources. Computing resources are the foundation for the operation of cloud applications. The core computing resources of the cloud platform include cloud servers (ECS / virtual machines), containers (Docker / K8s Pod), serverless functions, etc. The bottleneck is mainly reflected in the fact that the CPU utilization and memory utilization of these computing resources are consistently high (usually exceeding 85% is considered a high load state). For example, during major e-commerce promotions, the CPU utilization of cloud servers handling order processing surged to 95%, causing long wait times after users submitted orders and significantly reducing the efficiency of core business logic execution. The normal 1-2 second response time for payment settlement increased to over 10 seconds. Short video platforms rely on container clusters for video transcoding tasks. If container memory is insufficient and transcoding tasks flood in, memory overload occurs, triggering a page swapping mechanism that writes some memory data to the disk swap partition. Since disk I / O speed is much slower than memory, this directly causes video transcoding to stutter or fail. A retail company's promotional SMS push relies on Serverless functions. A sudden surge of millions of push requests caused CPU resources to be fully utilized during concurrent function execution, resulting in numerous push tasks timed out and failing. When memory overload reaches extreme levels, the system automatically terminates processes consuming excessive memory to ensure its own stability, directly causing business service interruptions.

[0035] Cloud platform storage resources mainly include cloud disks, object storage, file storage, and distributed database storage. The primary storage bottleneck is disk capacity, meaning that the utilization rate of various storage resources approaches 100%, preventing new data from being written and logs from being retained. For example, an object storage service (OSS) on a live streaming platform is used to store live stream playback videos. As playback videos accumulate, storage utilization approaches its limit, preventing new live stream playbacks from being archived properly. Secondly, there is the disk I / O bottleneck, where disk read / write IOPS (input / output operations per second) or throughput reaches its limit. This manifests as increased file read / write latency and database query lag (especially full table scan operations involving a large amount of disk I / O). For example, a financial institution's core trading system uses cloud disks to store database data. During peak trading hours, a large number of transaction records are written, causing IOPS to reach their maximum, extending transaction confirmation latency from the normal 0.5 seconds to over 5 seconds.

[0036] Cloud platform vendors typically include built-in monitoring consoles. Electronic devices can connect to the backend of these consoles and access the monitoring / operations modules. By filtering the monitoring metrics list for CPU usage, memory usage, and disk usage, the devices can obtain the cloud platform's current operational attributes. Alternatively, an open-source monitoring system can be deployed to collect this data via an agent. For cloud platform connection counts, with operating system-level monitoring enabled, electronic devices can access the cloud server to view TCP / UDP connection numbers. These attributes represent the cloud platform's current operational status, providing a data foundation for subsequent risk monitoring.

[0037] S102 determines the cloud platform's operating status value based on cloud disk utilization, CPU utilization, memory utilization, and the number of connections.

[0038] In this embodiment, after the electronic device obtains the cloud disk utilization rate, CPU utilization rate, memory utilization rate, and number of connections, higher cloud disk utilization rate, CPU utilization rate, memory utilization rate, and number of connections indicate greater resource consumption on the cloud platform and data processing congestion, which can easily lead to business processing delays, slow response speeds, or even crashes. Therefore, the electronic device can comprehensively analyze the above multiple attributes to accurately determine the current operating status of the cloud platform. By comprehensively analyzing the magnitudes of these multiple attributes, the operating status of the cloud platform can be quantified.

[0039] S103, obtain the response time of each completed business within the preset time period.

[0040] In this embodiment of the application, the preset time period can be a period of the past five minutes, the past fifteen minutes, etc. After the cloud platform processes each service, it records and stores the response time of each service. Electronic devices can obtain the response time of each completed service within the preset time period by calling the relevant API interface for storing response times.

[0041] S104 determines the risk value of the cloud platform based on the running status value and the response time of each completed business.

[0042] In this embodiment, the response time of each completed business within a preset time period represents the actual operational quality of the cloud platform during business processing within that time period. A lower response time indicates faster business processing speed, while a longer response time indicates slower processing speed, making subsequent issues such as lag, unresponsiveness, or even crashes more likely. By comprehensively analyzing the actual operational quality of the cloud platform's business processing and the current operational status of the cloud platform, the electronic device can accurately determine the magnitude of subsequent anomalies and crashes. This quantifies the subsequent risks to obtain a risk value. Compared to simply analyzing risks based on various cloud platform attributes, combining the specific response time of the cloud platform during business processing provides a more accurate monitoring and analysis of subsequent operational risks.

[0043] One possible implementation of this application embodiment is that step S102 determines the cloud platform's operating status value based on cloud disk utilization, CPU utilization, memory utilization, and the number of connections. Specifically, this includes steps S1021 (not shown in the figure), S1022 (not shown in the figure), S1023 (not shown in the figure), and S1024 (not shown in the figure). S1021 filters out the first target attribute that has reached the corresponding upper limit threshold and the second target attribute that has not reached the corresponding upper limit threshold from cloud disk utilization, CPU utilization, memory utilization and connection number.

[0044] In this application embodiment, cloud disk utilization, CPU utilization, memory utilization, and connection count each have corresponding upper limit thresholds. Reaching the upper limit threshold indicates that the cloud platform is currently experiencing high resource consumption and congestion, posing a greater possibility of future risks. For example, the upper limit threshold for cloud disk utilization is 95%, for CPU utilization it is 85%, and for memory utilization it is 85%. Therefore, the above four attributes, such as cloud disk utilization, on the electronic device are compared with their respective thresholds to determine the first target attribute that has reached the threshold and the second target attribute that has not reached the corresponding threshold.

[0045] S1022, determine the number of first target attributes, and determine the difference between each second target attribute and its corresponding upper limit threshold.

[0046] In this embodiment of the application, the more first target attributes that reach the corresponding threshold, the greater the likelihood of subsequent risks to the cloud platform. Therefore, the electronic device counts the first target attributes that reach the corresponding threshold to obtain the total number of all first target attributes. The electronic device calculates the difference between each second target attribute that does not reach the corresponding threshold and the corresponding threshold. This difference represents the distance of each second target attribute from the corresponding upper threshold. The larger the difference, the farther away from the corresponding upper threshold, the lower the likelihood of subsequent risks, and the better the operating status of the cloud platform.

[0047] S1023, determine the first state value based on the difference of each second target attribute and its corresponding first weight.

[0048] In this embodiment, different attributes have varying degrees of impact on the operation of the cloud platform. For example, CPU utilization has the highest impact on the operational quality of the cloud platform, followed by memory utilization, then the number of connections, and finally, cloud disk utilization has the lowest impact. Therefore, the staff assigns appropriate first weights to these attributes according to their degree of impact: 0.4 for CPU utilization, 0.3 for memory utilization, 0.2 for the number of connections, and 0.1 for cloud disk utilization. After setting these first weights, they are stored in the electronic device. Once the electronic device identifies the second target attribute, it normalizes the differences between the second target attributes to eliminate the influence of different units. Then, it calls the corresponding first weights to obtain the state value for each second target attribute. The electronic device sums the state values ​​of all second target attributes to obtain the first state value representing the current normal operating status of the cloud platform. This first state value combines the weights of the different attributes' impact on the cloud platform's operation with the difference from the corresponding upper limit threshold, making the quantitative analysis of the cloud platform's first state value more accurate.

[0049] S1024, Determine the cloud platform's operating status value based on the number of the first target attributes and the first status value.

[0050] In the embodiments of this application, the first target attribute that reaches the corresponding upper limit threshold has a negative impact on the operation of the cloud platform. Therefore, the electronic device can accurately obtain the operating status value of the cloud platform by combining the number of first target attributes reached with the first state value for comprehensive analysis.

[0051] One possible implementation of this application embodiment is that step S1024, which determines the cloud platform's operating status value based on the number of the first target attributes and the first status value, specifically includes steps one, two, and three, wherein... Step 1: Determine the ratio of the number of the first target attribute to the total number of all attributes.

[0052] In this embodiment of the application, the electronic device obtains the ratio of the number of the first target attribute by dividing the number of all attributes by the number of all attributes. For example, if the total number of attributes is four and the number of the first target attribute is one, the electronic device determines a ratio of 1 / 4. A larger ratio indicates that the proportion of the first target attribute reaching the corresponding upper limit threshold is larger among all attributes, and the greater the negative impact on the operation of the cloud platform.

[0053] Step 2: Determine the second state value of all first target attributes based on the quantity ratio and the first weight of all first target attributes.

[0054] In this embodiment of the application, the first target attribute also corresponds to a first weight. The first weight of all first target attributes represents the key factors that cause negative impacts on the operation of the cloud platform due to attributes exceeding the upper threshold. Furthermore, the ratio of the number of all first target attributes is also a key factor causing negative impacts on the operation of the cloud platform. Therefore, by combining the ratio of the number of attributes with the first weight of all first target attributes, the electronic device can accurately determine the second state value regarding the negative impact of all first target attributes on the operation of the cloud platform.

[0055] Step 3: Determine the difference between the first state value and the second state value to obtain the cloud platform's operating state value.

[0056] In the embodiments of this application, after the electronic device determines the first state value for normal attributes and the second state value for abnormal attributes that have reached the upper limit threshold, the current operating state value of the cloud platform can be obtained by subtracting the second state value from the first state value. That is, the operating state value representing the current actual state of the cloud platform can be obtained by subtracting the second state value that has a negative impact on the cloud platform from the operating state value that represents the current normal operation.

[0057] One possible implementation of this application embodiment is that, in step two, the second state value of all first target attributes is determined based on the quantity ratio and the first weight of all first target attributes. Specifically, this includes steps S1 (not shown in the figure) and S2 (not shown in the figure), wherein... S1, determine the sum of the first weights of all first target attributes.

[0058] In this embodiment of the application, the electronic device first sums the first weights of all the first target attributes to obtain the total first weights of all the first target attributes. The total first weights represent the sum of the negative impacts of all the first target attributes on the operation of the cloud platform. The larger the first weight, the greater the negative impact of these attributes on the cloud platform.

[0059] S2, substitute the first weight sum and the quantity ratio into the first preset relational function to obtain the second state value.

[0060] In summary, for the embodiments of this application, the first weight sum and the ratio of the number of all first target attributes are key factors regarding the degree of negative impact on the operation of the cloud platform. Therefore, the staff can pre-set a first preset relational function for calculating the second state value based on the first weight sum and the ratio of the number of attributes. Calculating the second state value using the first preset function can bring the second state value to the same order of magnitude as the first state value. For example, the first preset function can be a ternary linear equation y = 3m + 2n, where y is the second state value, m is the first weight sum, n is the ratio of the number of attributes, 3 is the calculation coefficient of the first weight sum, and 2 is the calculation coefficient of the ratio of the number of attributes. The electronic device can obtain the second state value by substituting the first weight sum and the ratio of the number of attributes into the aforementioned first preset relational function.

[0061] One possible implementation of this application embodiment involves determining the cloud platform's risk value in step S104 based on the running status value and the response time of each completed service. Specifically, this includes steps S1041 (not shown in the figure), S1042 (not shown in the figure), S1043 (not shown in the figure), S1044 (not shown in the figure), and S1045 (not shown in the figure). S1041, based on the response time, determine the first service volume whose response time reaches the preset time threshold, and determine the maximum response time in the first service volume.

[0062] In this embodiment, the preset time threshold can be 1.5 seconds. A response time of 1.5 seconds indicates an excessively long response time, which could lead to subsequent lag, unresponsiveness, or crashes on the cloud platform. Therefore, the electronic device filters out completed transactions that reach the preset time threshold from the preset time period and counts these transactions to obtain a first transaction volume. A larger first transaction volume indicates a slower cloud platform processing speed and a greater likelihood of subsequent risks. Then, the electronic device filters the first transaction volume to determine the maximum response time. A longer maximum response time indicates a slower cloud platform speed and a greater likelihood of subsequent risks.

[0063] S1042, determine the first ratio of the first business volume to the total business volume within the preset time period.

[0064] In this embodiment of the application, the electronic device divides the first service volume by the total number of services within a preset time period to obtain a first ratio. The first ratio represents the proportion of the first service volume with excessively long response times to the total number of services within the preset time period. The larger the first ratio, the more services are processed too slowly, the higher the resource consumption, and the greater the possibility of subsequent risks.

[0065] S1043, Generate a histogram based on the response time of the first business volume and the preset grouping interval, and determine the target interval with the most first business volume in the histogram.

[0066] In this embodiment of the application, the preset grouping intervals are multiple intervals divided according to response time. The electronic device generates a histogram based on the response time of these first traffic volumes and the preset grouping intervals. The histogram records the distribution of traffic volumes in different response time intervals. Then, the electronic device determines the target interval with the highest first traffic volume from the histogram. The target interval is the preferred interval for response time.

[0067] S1044, determine the median response time based on the upper and lower limits of the target interval, and determine the time difference between the median response time and the preset time threshold.

[0068] In this embodiment, after the electronic device determines the target interval, it sums the upper and lower limits of the response time within the target interval and divides by 2 to obtain the median response time of the target interval. Using the median response time more accurately represents the overall response time level of the target interval. After determining the median response time, the electronic device subtracts a preset time threshold from the median response time to obtain the time difference. The time difference represents the gap between the general response time level of the first service volume and the preset time threshold. The larger the time difference, the more the response time of these services exceeds the preset time threshold, and thus the greater the possibility of subsequent risks to the cloud platform.

[0069] S1045 determines the risk value of the cloud platform based on the operating status value, maximum response time, first ratio, and time difference.

[0070] In summary, for the embodiments of this application, the cloud platform's operating status value, maximum response time, first ratio, and time difference are all key factors affecting the likelihood of subsequent risks to the cloud platform. Therefore, electronic devices can accurately determine the risk value of the cloud platform by comprehensively analyzing the above four factors.

[0071] One possible implementation of this application embodiment is that step S1045, which determines the risk value of the cloud platform based on the running status value, maximum response time, first ratio, and time difference, specifically includes steps Sa (not shown in the figure) and Sb (not shown in the figure), wherein... Sa determines the anomaly level of the cloud platform based on the maximum response time, the first ratio, the time difference, and their respective second weights.

[0072] In this embodiment, the maximum response time, the first ratio, and the time difference are all key factors affecting the abnormal operation of the cloud platform, and their impact on the abnormal operation varies. Therefore, the staff assigns a corresponding second weight to the maximum response time, the first ratio, and the time difference, and stores them in the electronic device. The electronic device normalizes the maximum response time, the first ratio, and the time difference to obtain their respective normalized values, thereby eliminating the influence of different units. The electronic device calls the corresponding second weight to perform a weighted calculation and summation of these normalized values ​​to obtain the abnormality level value of the cloud platform.

[0073] Sb, substitutes the running status value and the abnormality value into the second preset relational function to obtain the risk value of the cloud platform.

[0074] In this embodiment, the anomaly severity value incorporates the impact of the cloud platform's actual operation on business processing response speed. Therefore, both the operational status value and the anomaly severity value are key factors regarding the likelihood of future risks to the cloud platform. Staff can set a second preset relational function to calculate the risk value based on the operational status value and the anomaly severity value. This second preset relational function can also be a linear equation in two variables. Specifically, the second preset relational function could be c = 0.6a + 3b, where c is the risk value, a is the operational status value, b is the anomaly severity value, 0.6 is the calculation parameter for the operational status value, and 3 is the calculation parameter for the anomaly severity value. The electronic device substitutes the determined operational status value and the anomaly severity value into the aforementioned first preset relational function to obtain the risk value for the cloud platform. This risk value combines the operational status value of various cloud platform attributes with the anomaly severity value related to business processing response speed; therefore, using the risk value to characterize the likelihood of future risks to the cloud platform is more accurate.

[0075] In one possible implementation of this application embodiment, step S105 (not shown in the figure) is included after step S104, wherein... S105, if the risk value reaches the preset risk threshold, a prompt message will be output.

[0076] In this embodiment, a preset risk threshold serves as the dividing line for excessively high risk values. The electronic device compares the determined risk value with the preset risk threshold. If the preset risk threshold is reached, it indicates that the cloud platform is highly likely to experience subsequent risks, requiring timely handling and preparation. The electronic device can output a prompt message to the staff's terminal devices, such as mobile phones and computers, along with the text message "Cloud platform risk value is too high, please handle it promptly." Alternatively, it can send the text message "Cloud platform risk value is too high, please handle it promptly" to the administrator's upper-level management interface. By outputting prompt messages, relevant personnel can be promptly informed of the cloud platform's risk level, facilitating subsequent handling.

[0077] The above embodiments describe a cloud platform risk monitoring method from the perspective of process flow. The following embodiments describe a cloud platform risk monitoring system 20 from the perspective of virtual modules or virtual units. For details, please refer to the following embodiments.

[0078] This application provides a cloud platform risk monitoring system 20, such as... Figure 2 As shown, a cloud platform risk monitoring system 20 may specifically include: The attribute acquisition module 201 is used to acquire multiple attributes of the cloud platform, including cloud disk utilization, cloud server CPU utilization, memory utilization, and number of connections. The status determination module 202 is used to determine the operating status value of the cloud platform based on cloud disk utilization, CPU utilization, memory utilization, and number of connections. The time acquisition module 203 is used to acquire the response time of each completed business within a preset time period; The risk assessment module 204 is used to determine the risk value of the cloud platform based on the running status value and the response time of each completed business.

[0079] This application discloses a cloud platform risk monitoring system 20. The system includes an attribute acquisition module 201 that acquires multiple attributes of the cloud platform, such as cloud platform utilization and cloud server CPU utilization, to understand the specific parameters and operating status of the cloud platform, thus facilitating subsequent risk analysis and judgment. These attributes also facilitate the quantification of the cloud platform's operating status. Specifically, the status determination module 202 accurately determines the current operating status value of the cloud platform based on these attributes. The time acquisition module 203 acquires the response time of each completed business transaction over a past period. Response time represents the speed and quality of the cloud platform's processing of each business transaction; slower speeds and lower quality indicate a higher probability of subsequent anomalies and crashes. The risk determination module 204 combines the business response time with the current operating status value of the cloud platform for comprehensive analysis, enabling a more accurate determination of the magnitude of subsequent anomalies and crashes. Compared to simply analyzing the magnitude of various cloud platform attributes, combining the specific response time when the cloud platform processes business transactions provides a more accurate assessment of the cloud platform's subsequent operational risks.

[0080] In one possible implementation of this application embodiment, when determining the cloud platform's operating status value based on cloud disk utilization, CPU utilization, memory utilization, and connection count, the status determination module 202 is specifically used for: Filter out the first target attribute that has reached the corresponding upper limit threshold and the second target attribute that has not reached the corresponding upper limit threshold from cloud disk utilization, CPU utilization, memory utilization and connection number; Determine the number of first target attributes, and determine the difference between each second target attribute and its corresponding upper limit threshold; The first state value is determined based on the difference of each second target attribute and its corresponding first weight; The cloud platform's operating status value is determined based on the number of the first target attributes and the first state value.

[0081] In one possible implementation of this application embodiment, when the state determination module 202 determines the operating state value of the cloud platform based on the number of first target attributes and the first state value, it is specifically used for: Determine the ratio of the number of the first target attribute to the total number of all attributes; The second state value of all first target attributes is determined based on the quantity ratio and the first weight of all first target attributes; The difference between the first state value and the second state value is used to determine the operating state value of the cloud platform.

[0082] In one possible implementation of this application embodiment, when the state determination module 202 determines the second state value of all first target attributes based on the quantity ratio and the first weight of all first target attributes, it is specifically used for: Determine the sum of the first weights of all first target attributes; Substitute the first weight sum and the quantity ratio into the first preset relational function to obtain the second state value.

[0083] In one possible implementation of this application embodiment, when determining the risk value of the cloud platform based on the running status value and the response time of each completed service, the risk determination module 204 is specifically used for: Based on the response time, the first volume of business whose response time reaches the preset time threshold is determined, and the maximum response time in the first volume of business is determined. Determine the first ratio between the first volume of business and the total volume of business within a preset time period; A histogram is generated based on the response time of the first business volume and the preset grouping interval, and the target interval with the most first business volume in the histogram is determined. The median response time is determined based on the upper and lower limits of the target interval, and the time difference between the median response time and the preset time threshold is determined. The risk value of the cloud platform is determined based on the operating status value, maximum response time, first ratio, and time difference.

[0084] In one possible implementation of this application embodiment, when determining the risk value of the cloud platform based on the operating status value, maximum response time, first ratio, and time difference, the risk determination module 204 is specifically used for: The anomaly level of the cloud platform is determined based on the maximum response time, the first ratio, the time difference, and their respective second weights. The risk value of the cloud platform is obtained by substituting the running status value and the abnormality value into the second preset relational function.

[0085] One possible implementation of this application embodiment, a cloud platform risk monitoring system 20 further includes: The output module is used to output a prompt message when the risk value reaches a preset risk threshold.

[0086] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the cloud platform risk monitoring system 20 described above can be referred to the corresponding process in the aforementioned method embodiments, and will not be repeated here.

[0087] This application provides an electronic device, such as... Figure 3 As shown, Figure 3 The illustrated electronic device 30 includes a processor 301 and a memory 303. The processor 301 and the memory 303 are connected, for example, via a bus 302. Optionally, the electronic device 30 may also include a transceiver 304. It should be noted that in practical applications, the transceiver 304 is not limited to one type, and the structure of this electronic device 30 does not constitute a limitation on the embodiments of this application.

[0088] Processor 301 may be a CPU (Central Processing Unit), a general-purpose processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute the various exemplary logic blocks, modules, and circuits described in conjunction with the disclosure of this application. Processor 301 may also be a combination that implements computational functions, such as including one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

[0089] Bus 302 may include a pathway for transmitting information between the aforementioned components. Bus 302 may be a PCI (Peripheral Component Interconnect) bus or an EISA (Extended Industry Standard Architecture) bus, etc. Bus 302 can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 3 The symbol is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.

[0090] The memory 303 may be a ROM (Read Only Memory) or other type of static storage device capable of storing static information and instructions, RAM (Random Access Memory) or other type of dynamic storage device capable of storing information and instructions, or an EEPROM (Electrically Erasable Programmable Read Only Memory), CD-ROM (Compact Disc Read Only Memory) or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital universal optical discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, but not limited thereto.

[0091] The memory 303 is used to store application code that executes the solution of this application, and its execution is controlled by the processor 301. The processor 301 is used to execute the application code stored in the memory 303 to implement the content shown in the foregoing method embodiments.

[0092] Electronic devices include, but are not limited to: mobile terminals such as mobile phones, laptops, digital radio receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), and in-vehicle terminals (such as in-vehicle navigation terminals), as well as fixed terminals such as digital TVs and desktop computers. Servers can also be included. Figure 3 The electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0093] This application provides a computer-readable storage medium storing a computer program that, when run on a computer, enables the computer to execute the corresponding content in the aforementioned method embodiments. Compared with related technologies, this application obtains multiple attributes of the cloud platform, such as cloud platform utilization and cloud server CPU utilization, which facilitates understanding the specific parameters and operating status of the cloud platform, thereby facilitating subsequent risk analysis and judgment. These multiple attributes facilitate the quantification of the cloud platform's operating status, accurately determining the current operating status value of the cloud platform. The response time of each completed business transaction over a past period is also obtained. Response time represents the speed and quality of the cloud platform's processing of each business transaction; slower speed and lower quality indicate a higher probability of subsequent abnormalities and crashes. By combining the business response time with the current operating status value of the cloud platform, a more accurate assessment of the risk of subsequent abnormalities and crashes can be achieved. Compared to simply analyzing the magnitude of various cloud platform attributes, combining the specific response time of the cloud platform during business processing provides a more accurate way to monitor and judge the subsequent operational risks of the cloud platform.

[0094] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.

[0095] The above description is only a partial embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.

Claims

1. A cloud platform risk monitoring method, characterized in that, include: Obtain multiple attributes of the cloud platform, including cloud disk utilization, cloud server CPU utilization, memory utilization, and number of connections; The operating status value of the cloud platform is determined based on the cloud disk utilization, CPU utilization, memory utilization, and number of connections. Get the response time of each completed business within a preset time period; The risk value of the cloud platform is determined based on the operational status value and the response time of each completed service.

2. The cloud platform risk monitoring method according to claim 1, characterized in that, The process of determining the cloud platform's operating status value based on cloud disk utilization, CPU utilization, memory utilization, and connection count includes: Filter the cloud disk utilization, CPU utilization, memory utilization, and number of connections to select the first target attribute that reaches the corresponding upper limit threshold and the second target attribute that does not reach the corresponding upper limit threshold; Determine the number of the first target attributes, and determine the difference between each second target attribute and its corresponding upper limit threshold; The first state value is determined based on the difference of each second target attribute and its corresponding first weight; The operating status value of the cloud platform is determined based on the number of the first target attributes and the first status value.

3. The cloud platform risk monitoring method according to claim 2, characterized in that, Determining the operating status value of the cloud platform based on the quantity of the first target attribute and the first status value includes: Determine the ratio of the number of the first target attribute to the total number of all attributes; The second state value of all first target attributes is determined based on the quantity ratio and the first weight of all first target attributes. The difference between the first state value and the second state value is used to obtain the operating state value of the cloud platform.

4. The cloud platform risk monitoring method according to claim 3, characterized in that, The step of determining the second state value of all first target attributes based on the quantity ratio and the first weight of all first target attributes includes: Determine the sum of the first weights of all first target attributes; The second state value is obtained by substituting the first weight sum and the quantity ratio into the first preset relationship function.

5. The cloud platform risk monitoring method according to claim 1, characterized in that, Determining the risk value of the cloud platform based on the operational status value and the response time of each completed service includes: Based on the response time, a first volume of traffic whose response time reaches a preset time threshold is determined, and the maximum response time in the first volume of traffic is determined. Determine a first ratio between the first volume of business and the total volume of business within the preset time period; A histogram is generated based on the response time of the first traffic volume and the preset grouping interval, and the target interval with the most first traffic volume in the histogram is determined. The median response time is determined based on the upper and lower limits of the target interval, and the time difference between the median response time and the preset time threshold is determined. The risk value of the cloud platform is determined based on the operating status value, maximum response time, first ratio, and time difference.

6. The cloud platform risk monitoring method according to claim 5, characterized in that, The determination of the risk value of the cloud platform based on the operating status value, maximum response time, first ratio, and time difference includes: The anomaly level of the cloud platform is determined based on the maximum response time, the first ratio, the time difference, and their respective second weights. The risk value of the cloud platform is obtained by substituting the operating status value and the abnormality value into the second preset relational function.

7. The cloud platform risk monitoring method according to claim 1, characterized in that, The method further includes: If the risk value reaches the preset risk threshold, a prompt message will be output.

8. A cloud platform risk monitoring system, characterized in that, include: The attribute acquisition module is used to acquire multiple attributes of the cloud platform, including cloud disk utilization, cloud server CPU utilization, memory utilization, and number of connections. The status determination module is used to determine the operating status value of the cloud platform based on the cloud disk utilization rate, CPU utilization rate, memory utilization rate, and number of connections. The time acquisition module is used to acquire the response time of each completed business within a preset time period; The risk assessment module is used to determine the risk value of the cloud platform based on the operating status value and the response time of each completed business.

9. A cloud platform risk monitoring system according to claim 8, characterized in that, When determining the operating status value of the cloud platform based on the cloud disk utilization, CPU utilization, memory utilization, and number of connections, the status determination module is specifically used for: Filter the cloud disk utilization, CPU utilization, memory utilization, and number of connections to select the first target attribute that reaches the corresponding upper limit threshold and the second target attribute that does not reach the corresponding upper limit threshold; Determine the number of the first target attributes, and determine the difference between each second target attribute and its corresponding upper limit threshold; The first state value is determined based on the difference of each second target attribute and its corresponding first weight; The operating status value of the cloud platform is determined based on the number of the first target attributes and the first status value.

10. A cloud platform risk monitoring system according to claim 9, characterized in that, When determining the operating status value of the cloud platform based on the quantity of the first target attribute and the first status value, the status determination module is specifically used for: Determine the ratio of the number of the first target attribute to the total number of all attributes; The second state value of all first target attributes is determined based on the quantity ratio and the first weight of all first target attributes. The difference between the first state value and the second state value is used to obtain the operating state value of the cloud platform.