Intelligent operation and maintenance-based software system performance bottleneck analysis and detection method

By employing intelligent operation and maintenance methods, combined with data preprocessing and model fusion, the problem of performance bottleneck detection that relies on experience in existing technologies has been solved. This enables efficient and robust performance bottleneck analysis, which is applicable to the detection of multi-dimensional data.

CN116225935BActive Publication Date: 2026-06-26DALIAN UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
DALIAN UNIV OF TECH
Filing Date
2023-03-01
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies rely on the experience of operations and maintenance personnel in software system performance bottleneck analysis, resulting in low accuracy of detection results and inapplicability to multi-dimensional data, making it impossible to effectively identify performance bottlenecks under different pressures and environments.

Method used

By adopting an intelligent operation and maintenance approach, multi-dimensional performance data is obtained through system stress testing. Data preprocessing and normalization are performed, and the performance data is reconstructed and compared using binary clustering and fusion of CNN, GRU, and VAE models. A new loss function calculation method is designed to detect performance bottlenecks.

Benefits of technology

It achieves efficient and robust detection of performance bottlenecks, is applicable to performance analysis under different pressures and environments, and improves detection accuracy and applicability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116225935B_ABST
    Figure CN116225935B_ABST
Patent Text Reader

Abstract

The present application belongs to the field of software system performance bottleneck analysis and detection, and relates to a software system performance bottleneck detection method based on intelligent operation and maintenance. The performance bottleneck detection method is as follows: first, performance data is acquired; second, performance data entry items are extracted to complete data preprocessing; third, test data is subjected to data two-group bottleneck clustering to obtain bottleneck time point marking division; fourth, training data is trained to obtain model architecture parameters and complete input data representation mapping; fifth, test data is trained to complete representation mapping with the training data; and sixth, the performance bottleneck detection result corresponding to the test data is obtained. The present application has low requirements for performance index data itself, a wider applicable scope, can effectively help complete system performance bottleneck analysis and interpretation by means of artificial intelligence, and has good applicability and robustness.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of software system performance bottleneck analysis and detection, and relates to a method for software system performance bottleneck analysis and detection based on intelligent operation and maintenance. Background Technology

[0002] Current modern software systems and applications, whether online shopping, social media platforms, or industrial software, are characterized by their processing involving large amounts of real-time data interaction, high timeliness, high visibility, and wide applicability. The overall business operations are characterized by multiple tasks, multiple scenarios, multiple roles, and a complete workflow. The system's performance and the performance analysis and optimization operations performed on the system are directly related to changes in the system environment, bottlenecks caused by the parameters and configurations of various system indicators, etc. [1] The system generates a large amount of indicator data reflecting the real-time status of the system during operation.

[0003] To support increased business volume, maintain larger scale, and meet the demands of big data processing, it is essential to ensure system performance during operation. Performance issues must also be considered before releasing new products or versions. Overloaded performance bottlenecks can lead to system delays or downtime, causing significant economic losses and prolonged delays in various time-sensitive services. Pre-deployment risk analysis and monitoring of potential performance indicators during operation determine the system's service quality and application limits, highlighting the importance of system performance bottleneck detection. Methods for analyzing and detecting system performance bottlenecks have received increasing attention and research.

[0004] System performance bottleneck analysis and detection falls under the field of operations and maintenance (O&M). The field and its practical applications in enterprises have evolved through stages such as manual O&M and script-based O&M, with methods including threshold-based or mathematical methods, and simulation. [2]Currently, there's a gradual transition to automated operations and maintenance (O&M). However, much of the current automated O&M, particularly in performance bottleneck analysis and detection, still focuses on automation in its methods and forms. The general approach involves first establishing approximate threshold ranges for certain prominent or directly significant indicators based on the experience of the testing domain and system operators. Bottlenecks are then identified and analyzed by checking if these thresholds are exceeded, or by accumulating scores based on the range within which indicators fall – a traditional categorization method. Threshold or range-based judgments are primarily geared towards monitoring during operation. For pre-deployment or testing-based bottleneck detection, this method only provides categorization and judgment, failing to analyze system bottlenecks and environmental conditions. Furthermore, threshold determination is highly dependent on the specific environment and heavily relies on the extensive domain experience of testers. Some more advanced algorithms employ predictive methods, using historical data of specific indicators to predict future performance patterns. However, this ignores the relationships and changes between indicator metrics and relies excessively on past states, leading to significant uncertainty in judging the actual system state and resulting in low accuracy. Moreover, for multi-dimensional data, predictions are not universally applicable to each dimension.

[0005] Of course, the thresholding method itself has certain theoretical and practical aspects, because performance bottlenecks are often accompanied by anomalies in key performance indicators, which can be referenced in multivariate time series anomaly detection. [3] The pattern. However, in practical applications, in many cases the indicators reach a bottleneck, but the indicator data does not necessarily show abnormalities. System performance bottlenecks are determined by the system's own environment, and are more of a result of comparisons under different pressures or environments, reflecting the system's carrying capacity. [2][4] Therefore, this invention integrates the comparison between different pressures and environments to complete the performance bottleneck analysis and detection, based on the overall process of anomaly detection.

[0006] Using artificial intelligence to solve big data-related operations and maintenance problems has also received increasing attention and research. Intelligent operations and maintenance (AIOps) combines machine learning with big data, learning segmentation rules in the data to replace human rules such as thresholds or other features. [5] .

[0007] Therefore, the problem this invention addresses is how to establish a relatively structured operational model and process, and how to use intelligent operation and maintenance (AIOps) related methods to analyze and locate software system performance bottlenecks, thereby changing the current situation where performance bottlenecks are solved entirely by relying on the individual abilities, experience, and analysis of engineers and requiring complex manual operations.

[0008] [1] Shen Hua, He Yanxiang, Zhang Mingwu. Performance bottleneck localization scheme based on structural features of service composition model [J]. Computer Science, 2015, 42(09):107-117. DOI:10.11896 / j.issn1002-137X.2015.9.022.

[0009] [2] Zhang Feipeng, Chen Lin, Zhang Jingjing. Measurement and performance bottleneck analysis method for large-scale complex networks [J]. Computer Science and Exploration, 2017, 11(02):262-270. DOI:10.3778 / j.issn.1673-9418.1512056.

[0010] [3] Zhang Renbin, Zuo Yicong, Zhou Zelin, et al. Anomaly detection of multivariate time series data based on multimodal generative adversarial network [J / OL]. Computer Science: 1-12 [2023-02-25]. DOI:10.11896 / jsjkx.220400221.

[0011] [4] Yu Qingyang, Bai Xiaoying, Li Mingjie, et al. Performance modeling and anomaly localization of large-scale microservice systems based on call chain control flow analysis [J]. Journal of Software, 2022, 33(05):1849-1864.

[0012] DOI:10.13328 / j.cnki.jos.006209.

[0013] [5] Wu Zhenyu, Shi Chang. Agile AIOps framework and operation and maintenance data quality assessment method [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(06):96-102+133. DOI:10.13190 / j.jbupt.2021-045. Summary of the Invention

[0014] This invention addresses the problem of performance bottleneck analysis and detection in software systems by providing a method based on intelligent operation and maintenance (O&M). This method overcomes the current reliance on the experience of O&M personnel for bottleneck handling in big data system platforms. It can be used to analyze bottlenecks by comparing performance data under different pressures and environments during stress testing in performance detection. It can also directly process and analyze monitored data, demonstrating both applicability and robustness.

[0015] To achieve the specific objectives, the technical solution of the present invention is as follows:

[0016] Software System Performance Bottleneck Analysis and Detection Method Based on Intelligent Operation and Maintenance

[0017] The first step is to acquire performance data.

[0018] System stress tests are conducted under varying pressure and environments using system stress testing tools. Different pressure values ​​are set, roughly divided into three ranges: low, high, and medium pressure. The stress test duration is typically controlled to be between 30 minutes and 1 hour in industry practice. Specific system environments include various environments such as the operating environment, the testing environment, or a specific environment to be tested. Multiple sets of multi-dimensional performance data are obtained during the stress test. Alternatively, a server connection can be established, or performance data monitored during system operation can be directly used.

[0019] The second step is to extract performance data items and complete data preprocessing.

[0020] 2.1 Data filtering and extraction: For performance metrics data obtained from stress testing or system monitoring, extract the metric dimensions and items. The extracted metric dimensions cover performance metrics data that characterize the status of various system modules, including CPU, MEM memory, OS kernel, disk, VM virtual machine, and NET network (PACKET).

[0021] 2.2 Perform data normalization. Normalize the extracted item data to minimize the problem of incomparability caused by different dimensions of data.

[0022] The third step is to perform bottleneck clustering on the test data to obtain bottleneck time point labels.

[0023] 3.1 Perform binary bottleneck (anomaly) clustering

[0024] 3.2 Perform truth labeling. Based on the bottleneck clustering results, use cluster result 1 as a high bottleneck and 0 as a low bottleneck to mark the high and low bottleneck (anomaly) time points. Label the performance data under different environments with dimensional truth labels to complete the comparison of the relationship between bottleneck points and anomalies.

[0025] To overcome the challenges of complex and difficult-to-obtain truth labels for multi-dimensional data representing performance bottlenecks, as well as difficulties in model validation, this invention proposes a binary clustering method for partitioning high and low bottleneck truth labels.

[0026] The binary clustering method for separating high and low bottleneck ground value labels is as follows:

[0027] K-means clustering is used to perform binary classification bottleneck clustering on the multi-dimensional performance time-series data obtained after processing for dimensional issues, dividing the results into high and low bottleneck time points. The distance calculation formula in the clustering is as follows, with the number of clusters specified as c = 2, where x i ={x i1 x i2 , ...x im} represents performance data for a specific metric across m time points, u j ={u j1 u j2 ,...u jm} represents the mean vector corresponding to the cluster centers, which is continuously updated during the calculation. The diskms distance formula is refined to specific dimensions and specific time points.

[0028]

[0029] The fourth step is to train the model on the training data to obtain the model architecture parameters and complete the representation mapping of the input data.

[0030] This invention proposes a reconstruction comparison method to address the problem of comparing multiple sets of multidimensional performance time-series data under different pressures and environments, thus characterizing data from different patterns. Regarding the anomaly detection method, it considers the comparison of environmental data, completes the mapping from bottleneck points to anomaly points, and designs a model architecture, the network structure of which is as follows: Figure 2 As shown, a fusion of one-dimensional Convolutional Neural Networks (CNN), Gate Recurrent Unit (GRU) models, and Variational Auto-Encoder (VAE) models is performed. Data features within a sliding window are extracted through stacked one-dimensional CNN and GRU layers and memorized over time. Two fully connected layers (Dense layers) are added to represent the mean and variance, respectively, and the distribution obtained from the mean and variance is mapped to latent variables in the hidden layers. Feature extraction is then completed through stacked fully connected Dense layers, with a dropout layer added to prevent overfitting.

[0031] Performance data obtained under low-stress conditions during stress testing or monitored in production operation environments are represented as normal state data and used as training data x. This data is then input into the network for the first stage of training, completing the training data reconstruction. That is, corresponding to x, we can simultaneously obtain the architectural parameters of each layer of the model such as CNN and the intermediate latent variable z of VAE. The parameters represent the mapping from the input data to themselves.

[0032] To address the issue of significant differences in performance data patterns and distributions under varying pressures and environments, this invention employs a novel calculation method to calculate the loss of the VAE model, replacing the KL divergence component in the original formula. The specific calculation formula is as follows, where D... was To use the Wasserstein method to calculate the distance between distributions, replacing the KL method, and to better characterize the cross-entropy between distributions. The system represents the reconstruction error and completes the calculation of the index's effect on the dimension and the processing of outlier scores. Here, p(x|z) corresponds to the hypothesized posterior normal distribution, q(z|x) corresponds to the conditional distribution, and p(z) is the known prior distribution, where z is generated after passing through CNN, GRU, Dense, and other layers.

[0033]

[0034] The fifth step is to train the test data and complete the mapping between the test data and the training data representation.

[0035] The performance data obtained under the same pressure but different environments, or under the same environment but different pressures, are used as test data to characterize the performance data of the actual system to be tested and analyzed. After processing the training data to obtain the model parameters, the test data is input into the network to complete the second stage of test data reconstruction.

[0036] Step 6: Obtain the performance bottleneck detection results for the corresponding test data.

[0037] After training the model on the test data (i.e., completing the second stage of test data reconstruction), the performance bottleneck detection results for the test data are obtained, characterizing the system state where the test data resides. These results are then compared with other models or processing methods.

[0038] The beneficial effects of this invention are:

[0039] This invention, considering the inherent abnormal performance characteristics and combining them with a comparison of system bottleneck states, completes the entire process of performance bottleneck analysis and detection. It has low requirements for performance indicator data, a wider range of applicable problems, and can effectively use artificial intelligence methods to assist in the analysis, detection, and interpretation of software system performance bottlenecks, exhibiting good applicability and robustness. Attached Figure Description

[0040] Figure 1 A flowchart illustrating the process of analyzing and detecting performance bottlenecks in software systems based on intelligent operation and maintenance;

[0041] Figure 2 Schematic diagram of the neural network model architecture for performance bottleneck detection;

[0042] Figure 3 A schematic diagram illustrating the bottleneck clustering effect of binary grouping for random data;

[0043] Figure 4 A schematic diagram illustrating the bottleneck clustering effect of binary grouping for periodic data;

[0044] Figure 5 A schematic diagram illustrating the effect of two-dimensional spatial projection index dimension bottleneck clustering bottleneck point partitioning;

[0045] Figure 6 Schematic diagram of abnormal score results for overall time point indicator dimensions;

[0046] Figure 7 A diagram showing the ranking of negative scores and abnormality levels for bottleneck indicator dimensions. Detailed Implementation

[0047] To make the technical solution of the present invention clearer, the present invention will be further described below with reference to the accompanying drawings. The present invention is implemented in specific steps:

[0048] A software system performance bottleneck detection and analysis method based on intelligent operation and maintenance, the specific process of which is as follows: Figure 1 As shown, the steps are as follows:

[0049] The first step is to acquire performance data.

[0050] System stress tests are conducted under varying pressure and environments using system stress testing tools. Different pressure values ​​are set, roughly divided into three ranges: low, high, and medium pressure. The stress test duration is typically controlled to be between 30 minutes and 1 hour in industry practice. Specific system environments include various environments such as the operating environment, the testing environment, or a specific environment to be tested. Multiple sets of multi-dimensional performance data are obtained during the stress test. Alternatively, a server connection can be established, or performance data monitored during system operation can be directly used.

[0051] The second step is to preprocess the data and extract the features of the indicator items.

[0052] (1) Perform data filtering and extraction

[0053] For performance metrics data obtained from stress testing or system monitoring, extract the metric dimensions and items. The extracted metric dimensions cover performance metrics data that can characterize the status of various modules of the system, such as CPU, MEM memory, OS kernel, disk disk, VM virtual machine, and NET network (PACKET package). Some of the extracted performance data items are shown in Table 1.

[0054] Table 1. Extracted Performance Indicator Items (Partial)

[0055]

[0056] (2) Perform data normalization

[0057] Considering that different indicators represent different states and that data units are inconsistent, comparing data across different units can be difficult. Therefore, the extracted item data undergoes max-min normalization, and the operation is performed within each respective dimension indicator to minimize the problem of incomparability caused by differences in data units across different dimensions. Where x... i For the data of the i-th indicator dimension, To process the resulting data, x imax x represents the maximum value in the sample data for this dimension. imin It is the minimum value in the sample data for this indicator dimension.

[0058]

[0059] The third step is to perform data binary grouping and bottleneck clustering on the test data to obtain the bottleneck time point labeling.

[0060] (1) Perform binary bottleneck (abnormal) clustering.

[0061] To overcome the challenges of complex and difficult-to-obtain truth labels for multi-dimensional performance data and the difficulties in model validation, this invention proposes a binary clustering method for separating high and low bottleneck truth labels. Specifically, K-means clustering is used to perform binary bottleneck clustering on the multi-dimensional performance time-series data obtained after processing for dimensionality issues. The results of dividing the data into high and low bottleneck time points are used as the truth labels. The number of clusters is specified as c = 2, where x... i ={x i1 x i2 , ...x im} represents performance data for a specific metric across m time points, u j ={u j1 u j2 ,...u jm} represents the mean vector corresponding to the cluster centers, which is continuously updated during the calculation. kms The distance formula is refined to specific time points in specific dimensions.

[0062]

[0063] A schematic diagram of clustering bottlenecks in random data is shown below. Figure 3 As shown, it effectively covers time points where values ​​experience sudden changes and where the values ​​themselves are relatively large. A schematic diagram of periodic data bottleneck clustering is shown below. Figure 4 As shown, it effectively covers periodic time points and time points with relatively large periodic values. After binary bottleneck clustering, the dimensions of each indicator are projected onto a two-dimensional space. The bottleneck point partitioning effect represented in the two-dimensional space is as follows. Figure 5 As shown.

[0064] (2) Mark the truth value.

[0065] Based on the bottleneck clustering results, cluster result 1 is used as a high bottleneck and 0 as a low bottleneck to mark the high and low bottleneck (anomaly) time points, serving as the model's detection and verification at the bottleneck points. Dimensional ground truth labels are then applied to performance data under different defined environments.

[0066] The fourth step is to train the data.

[0067] This invention proposes a reconstruction-comparison method to address the problem of comparing multiple sets of multidimensional performance time-series data under different pressures and environments. A model architecture is designed to extract temporal features from both window data and time-series data. Its network structure is as follows: Figure 2 As shown, a one-dimensional CNN, a GRU model, and a VAE variational model are fused together, and a Dropout layer is added to prevent overfitting.

[0068] The VAE variational model handles reconstruction operations by mapping itself to compare performance data under different stresses and environments, performing unsupervised processing and further reducing the requirement for truth labels.

[0069] Data representing a normal state under low stress is used as training data x and input into the network for the first stage of training to complete the reconstruction of the training data. That is, corresponding to x, we simultaneously obtain the architectural parameters of each layer of the model (CNN, etc.) and the intermediate latent variable z of the VAE. The parameters represent the mapping from the input data to themselves, and the reconstruction error represents the dimensionality anomaly score. The dimensionality anomaly score result is as follows: Figure 6 As shown.

[0070] To address the issue of significant differences in performance data patterns and distributions under varying pressures and environments, this invention employs a novel calculation method to calculate the loss of the VAE model, replacing the KL divergence component in the original formula. The specific calculation formula is as follows, where D... Was To calculate the distance between distributions, the Wasserstein method is used instead of the KL method. p(x|z) corresponds to the hypothesized posterior normal distribution, q(z|x) corresponds to the conditional distribution, and p(z) is the known prior distribution, where the intermediate latent variables of z are generated after passing through CNN, GRU, dense layers, etc. To characterize the reconstruction error, for the calculation of the index's effect on the dimension, the dimension outlier score is calculated using the reconstruction probability method, and the cross-entropy calculation is processed. The outlier score ranking index results are as follows: Figure 7 As shown. According to the representation in the reconstructed part, the larger the absolute value of the negative anomaly score, the more abnormal it is. Figure 7 The order from top to bottom represents the anomaly dimensions.

[0071]

[0072] The fifth step is to train on the test data.

[0073] The obtained performance data under the same pressure but different environments, or the same environment but different pressures, are used as test data to characterize the actual system performance data to be tested and analyzed. After processing the parameters of the model based on the training data, the test data is input into the network to complete the second stage of model processing: the reconstruction of the test data.

[0074] Simultaneously, the parameters of the overall network model structure are adjusted for performance index data of different lengths and dimensions.

[0075] Step 6: Obtain the performance bottleneck detection results for the corresponding test data.

[0076] (1) After training the test data, i.e., completing the second stage of test data reconstruction processing, the performance bottleneck detection results of the test data are obtained compared to the training data processing, representing the system state where the test data is located. Specific performance bottleneck detection results include the results of true positive, true negative, false positive, and false negative point divisions corresponding to TP, TN, FP, and FN under the global F1-score, the detection accuracy of bottleneck time points under the corresponding binary bottleneck clustering, and the dimensional truth hit rate at the corresponding bottleneck time points. Other machine learning metrics such as AUROC are also included, as well as dimensional anomaly score ranking. The anomaly score ranking results are as follows: Figure 7 As shown in the formula for calculating the reconstruction, the more abnormal the anomaly score, the larger the absolute value of the negative score. Figure 7 The ranking from top to bottom represents the degree of negative scores for each indicator dimension, indicating the severity of anomalies in that indicator dimension.

[0077] (2) Using the model architecture and operational procedures for performance bottleneck analysis and detection proposed in this invention, the detection results were compared with those of the domain baseline methods for multidimensional time series anomaly detection, such as the UAE (Univariate Fully-Connected AutoEncoder) model, the VAE-LSTM model, and the LSTM-ED model, all using the processing procedures proposed in this invention. As can be seen from the comparison results of different methods in Table 2, the scheme proposed in this invention shows significant improvements over other methods in F1 score, AUROC machine learning classification evaluation metrics, and the hit rate of the metric dimensions. It can effectively analyze, detect, and interpret software system performance bottlenecks using intelligent operation and maintenance methods.

[0078] Table 2 Comparison of results from different methods

[0079]

Claims

1. A method for analyzing and detecting performance bottlenecks in software systems based on intelligent operation and maintenance, characterized in that, The steps are as follows: The first step is to acquire performance data. System stress tests are conducted under different pressures and environments to obtain multiple sets of multi-indicator performance data. Alternatively, the system can be connected to a server or the actual performance data monitored during system operation can be used directly. The second step is to extract performance data items and complete data preprocessing. 2.1 Perform data filtering and extraction. For performance indicator data obtained from stress testing or system monitoring, complete the extraction of indicator dimension items. The extracted indicator dimensions cover performance indicator data that can characterize the status of each module of the system, including CPU, MEM memory, OS kernel, disk, VM virtual machine, and NET network. 2.2 Perform data normalization, normalizing the extracted item data; The third step is to perform bottleneck clustering on the test data to obtain bottleneck time point labels. 3.1 Perform binary bottleneck clustering 3.2 Perform truth labeling. Based on the bottleneck clustering results, use clustering result 1 as high bottleneck and 0 as low bottleneck to mark the high and low bottleneck time points; perform dimensional truth labeling on performance data under different set environments. The binary clustering method for separating high and low bottleneck ground value labels is as follows: K-means clustering is used to perform binary classification bottleneck clustering on the multi-dimensional performance time-series data obtained after processing for dimensional issues, dividing the results into high and low bottleneck time points; the distance calculation formula in the clustering is as follows: The number of clusters is specified as = 2, where For inclusion Performance data of a certain indicator at a specific point in time. The mean vector corresponding to the cluster center is continuously updated during the calculation; The distance formula is refined to specific dimensions and specific time points; The fourth step is to train the model on the training data to obtain the model architecture parameters and complete the representation mapping of the input data. A reconstruction-comparison method is adopted to address the problem of comparing multiple sets of multidimensional performance time-series data under different pressures and environments. A model architecture is designed as follows: This architecture integrates a one-dimensional convolutional neural network, a gated recurrent unit model, and a variational autoencoder model. Data features within a sliding window are extracted and memorized over time using stacked one-dimensional CNN and GRU layers. Two fully connected layers, namely Dense layers, are added to represent the mean and variance, respectively. The distribution obtained from the mean and variance is mapped to latent variables in the hidden layers. Feature extraction and output are then completed through stacked fully connected Dense layers, with a random deactivation layer added to prevent overfitting. Data representing a normal state under low stress was used as training data. The data is input into the network for the first stage of training processing, completing the reconstruction of the training data. That is, corresponding Simultaneously, the architectural parameters of each layer of the CNN model and the intermediate latent variables of the VAE were obtained. The parameter represents the mapping from the input data to itself; The fifth step is to train the test data and complete the mapping between the test data and the training data representation. The performance data obtained under the same pressure but different environments, or under the same environment but different pressures, are used as test data to characterize the actual system performance data to be tested and analyzed. After processing the training data to obtain the model parameters, the test data is input into the network to complete the second stage of test data reconstruction. Step 6: Obtain the performance bottleneck detection results for the corresponding test data. After training the test data, i.e. completing the second stage of test data reconstruction processing, the performance bottleneck detection results of the test data are obtained, which characterize the system state where the test data is located.

2. The method for analyzing and detecting performance bottlenecks in software systems based on intelligent operation and maintenance as described in claim 1, characterized in that, In the fourth step, the present invention employs a novel calculation method to calculate the VAE model. This replaces the KL divergence part in the original formula; The specific calculation formula is as follows: ; in To use the Wasserstein method to calculate the distance between distributions, replacing the KL method, and to better characterize the cross-entropy between distributions; The system represents the reconstruction error and completes the calculation of the indicator's effect on the dimensions and the handling of outlier scores; among which... The corresponding hypothesis is a normal distribution. Corresponding conditional distribution, Let be a known prior distribution, where The generated data is obtained after passing through CNN, GRU, and Dense layers.

3. The software system performance bottleneck analysis and detection method based on intelligent operation and maintenance as described in claim 1, characterized in that, in step 3.1, for multi-dimensional performance data, a binary bottleneck clustering method is adopted, dividing the time points into two categories, high and low bottleneck points, and labeling the high and low bottleneck true values ​​of the time points according to the division results; the test data is the target object of processing in the operation; the values ​​of each time point dimension in the index are used as the basis points for clustering to complete the distance calculation, and the number of clusters is specified as c = 2. Indicates inclusion Performance data of a certain indicator at a specific point in time. This represents the mean vector corresponding to the cluster center, which is continuously updated during the calculation until all mean vectors are no longer updated. The calculation of the distance formula is refined to specific time points in specific dimensions; 。 4. The software system performance bottleneck analysis and detection method based on intelligent operation and maintenance as described in claim 2, characterized in that, in step 3.1, for multi-dimensional performance data, a binary bottleneck clustering method is adopted, dividing the time points into two categories, high and low bottleneck points, and labeling the high and low bottleneck true values ​​of the time points according to the division results; the test data is the target object of processing in the operation; the values ​​of each time point dimension in the index are used as the basis points for clustering to complete the distance calculation, and the number of clusters is specified as c = 2. Indicates inclusion Performance data of a certain indicator at a specific point in time. This represents the mean vector corresponding to the cluster center, which is continuously updated during the calculation until all mean vectors are no longer updated. The calculation of the distance formula is refined to specific time points in specific dimensions; 。 5. The method for analyzing and detecting performance bottlenecks in software systems based on intelligent operation and maintenance as described in claim 1 or 4, characterized in that, The first step involves the following: conducting system stress tests under different pressure and environments using system stress testing tools, categorized into low-pressure, high-pressure, and medium-pressure ranges. The stress test duration is typically 30 minutes to 1 hour in industry practice. Specific system environments include various environments such as the operating environment, testing environment, or a pre-defined environment to be tested. Multiple sets of multi-dimensional performance data are acquired during the stress test. Alternatively, a server can be connected, or performance data monitored during system operation can be directly used. Performance data obtained under low-pressure conditions or monitored in the production environment are used to represent the normal state and serve as training data. Data obtained from the environment under test is used as test data.

6. The method for analyzing and detecting performance bottlenecks in software systems based on intelligent operation and maintenance as described in claim 2, characterized in that, The first step involves the following: conducting system stress tests under different pressure and environments using system stress testing tools, categorized into low-pressure, high-pressure, and medium-pressure ranges. The stress test duration is typically 30 minutes to 1 hour in industry practice. Specific system environments include various environments such as the operating environment, testing environment, or a pre-defined environment to be tested. Multiple sets of multi-dimensional performance data are acquired during the stress test. Alternatively, a server can be connected, or performance data monitored during system operation can be directly used. Performance data obtained under low-pressure conditions or monitored in the production environment are used to represent the normal state and serve as training data. Data obtained from the environment under test is used as test data.

7. The method for analyzing and detecting performance bottlenecks in software systems based on intelligent operation and maintenance as described in claim 3, characterized in that, The first step involves the following: conducting system stress tests under different pressure and environments using system stress testing tools, categorized into low-pressure, high-pressure, and medium-pressure ranges. The stress test duration is typically 30 minutes to 1 hour in industry practice. Specific system environments include various environments such as the operating environment, testing environment, or a pre-defined environment to be tested. Multiple sets of multi-dimensional performance data are acquired during the stress test. Alternatively, a server can be connected, or performance data monitored during system operation can be directly used. Performance data obtained under low-pressure conditions or monitored in the production environment are used to represent the normal state and serve as training data. Data obtained from the environment under test is used as test data.