A computer-aided diagnosis method based on deep learning

By using an improved TimeGPT model, combined with operating condition context coding, multi-index joint baseline modeling, and decoupled prediction of system state components, the inconsistency and false alarm/missed alarm problems in the existing computer system operating status diagnosis are solved. This enables accurate identification of computer system operating anomalies and risk level classification, improving the stability and accuracy of diagnosis.

CN122309209APending Publication Date: 2026-06-30SHANGHAI MAITONG INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI MAITONG INFORMATION TECHNOLOGY CO LTD
Filing Date
2026-03-19
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing computer system operation status diagnosis methods are unable to effectively characterize the intrinsic relationship between short-term fluctuations, medium-term changes and long-term performance evolution when processing multi-source operation status data. This results in diagnostic results that are highly sensitive to load changes and environmental switching, with prominent false alarms and false negatives, and lack of consistency and accuracy, making it difficult to identify systemic risks.

Method used

An improved TimeGPT model is adopted, which integrates operating condition context coding, multi-indicator joint baseline modeling and system state component decoupling prediction mechanism to construct a system health operation baseline that matches the current operating condition, conducts recoverability analysis and risk irreversibility determination, and achieves accurate identification and risk level classification of computer system operation anomalies.

Benefits of technology

It improves the stability and accuracy of diagnostic results, reduces the false alarm rate, can identify potential systemic risks in advance, and provides reliable support for operation and maintenance decisions, significantly enhancing the practicality and intelligence of computer-aided diagnosis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309209A_ABST
    Figure CN122309209A_ABST
Patent Text Reader

Abstract

This invention discloses a computer-aided diagnostic method based on deep learning, comprising the following steps: collecting multi-source operational status data of a computer system and performing time alignment to form a system operational status data sequence; constructing a multi-scale system status sequence set; inputting the multi-scale system status sequence set into an improved TimeGPT model to generate a baseline status sequence for healthy system operation; calculating the state deviation of the system operational status data sequence and extracting the recoverability features of the deviation; performing cumulative evolution analysis across operational cycles on low-recoverability state deviations to construct anomaly evidence vectors; determining the irreversibility of risk based on the anomaly evidence vectors, generating auxiliary diagnostic results and determining the risk level; and outputting system operation early warning information based on the auxiliary diagnostic results. This invention can achieve accurate identification and risk classification of computer system operational anomalies, providing reliable support for operation and maintenance decisions and possessing significant engineering application value.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer system operation status monitoring and intelligent diagnosis technology, and in particular to a computer-aided diagnosis method based on deep learning. Background Technology

[0002] With the rapid development of cloud computing, big data, and artificial intelligence technologies, the scale and complexity of computer systems in information processing, network services, and business support are constantly increasing, leading to widespread attention to computer-aided diagnostic technologies for computer system operation status. Current technologies for analyzing the health status of computer systems largely rely on rule-based threshold determination, single performance indicator monitoring, or anomaly detection methods based on traditional statistical models. Some solutions are beginning to introduce deep learning models to predict and analyze operational status. However, the following problems still commonly exist in practical applications: On the one hand, the multi-source operational status data generated during computer system operation is characterized by strong noise, strong fluctuations, and multiple time scales. Existing methods often focus on modeling a single time scale or a single indicator, making it difficult to effectively characterize the intrinsic relationship between short-term fluctuations, medium-term changes, and long-term performance evolution. This results in diagnostic results being highly sensitive to changes in operating load and environmental switching, with significant false alarms and false negatives. On the other hand, existing deep learning-based prediction models mostly adopt a unified historical sequence modeling approach, without explicit modeling and conditional constraints on operating conditions. This leads to a lack of consistency in baseline prediction results generated under different loads or operating environments, making it difficult to serve as a stable and reliable health reference benchmark. Furthermore, existing technologies typically predict each operational status indicator independently or perform simple superposition analysis, ignoring the coupling relationship and abnormal evolution process between multiple indicators. This makes it difficult to identify irreversible abnormal states that gradually evolve from local deviations into systemic risks, thus limiting the application effectiveness of computer-aided diagnosis in risk warning and operation and maintenance decision support.

[0003] Therefore, how to provide a computer-aided diagnostic method based on deep learning is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0004] One objective of this invention is to propose a computer-aided diagnostic method based on deep learning. This invention introduces an improved TimeGPT model that integrates operating condition context encoding, multi-index joint baseline modeling, and system state component decoupling prediction mechanism to construct a system health operating baseline that matches the current operating conditions. It performs recoverability analysis, cross-operating cycle cumulative evolution analysis, and risk irreversibility determination on deviations from the system operating state, thereby achieving accurate identification and risk level classification of computer system operating anomalies. It has the advantages of high diagnostic result stability, strong anomaly identification accuracy, low false alarm rate, and strong support for operation and maintenance decision-making.

[0005] A computer-aided diagnostic method based on deep learning according to an embodiment of the present invention includes the following steps: Step 1: Collect multi-source operational status data generated during the operation of the computer system, and timestamp and align the data to form a system operational status data sequence; Step 2: Perform multi-time-scale processing on the system operating state data sequence to form a multi-scale system state sequence set; Step 3: Input the multi-scale system state sequence set into the improved TimeGPT model. The improved TimeGPT model generates a computer system health operation baseline state sequence that matches the current operating conditions by introducing an operating condition context encoding mechanism, a multi-index joint baseline modeling mechanism and a system state component decoupling prediction mechanism. Step 4: Align and compare the system operating status data sequence with the healthy operating baseline status sequence, calculate the deviation magnitude, and extract the recoverability features of the status deviation based on the regression behavior of the deviation magnitude within a continuous time window; Step 5: Perform cumulative evolution analysis across operating cycles on state deviations with low recoverability characteristics, and construct anomaly evidence vectors based on the cumulative intensity and persistence of state deviations in the time dimension; Step 6: Based on the abnormal evidence vector, determine the irreversibility of risk, whether the abnormal operation of the system exceeds the self-regulation capability of the computer system, and under the condition that the multi-dimensional state deviation from synchronization accumulates and cannot be recovered, generate auxiliary diagnostic results of the computer's operating status and determine the corresponding risk level. Step 7: Based on the auxiliary diagnostic results, issue early warnings on the computer system's operating status and provide support for operation and maintenance decisions.

[0006] Optionally, the multi-source operating status data is a set of status data used to characterize the operating status of the computer system, including processor operating status data, memory and storage operating status data, network communication operating status data, and system load operating status data.

[0007] Optionally, step two specifically includes: The system operation status data sequence is arranged sequentially according to the preset sampling period, with the data points collected within a single sampling period as the basic time unit; Based on basic time units, multiple consecutive basic time units are spliced ​​together according to the first preset time window length to construct a short-term system state time series. Based on the short-term system state time series, multiple consecutive short-term system state time series are aggregated according to the second preset time window length to construct the medium-term system state time series. Based on the intermediate system state time series, the long-term system state time series are constructed by summarizing multiple consecutive intermediate system state time series according to the third preset time window length. The short-term, medium-term, and long-term system state time series are aligned in chronological order, and the corresponding system state data at different time scales are combined using the same time index to form a multi-scale system state time series set containing multi-time scale features.

[0008] Optionally, the improved TimeGPT model includes an input embedding module, a Transformer encoding module, a Transformer decoding module, and an output prediction module: The input embedding module performs embedding representation processing on the multi-scale system state time series set, introduces a multi-index joint baseline modeling mechanism, and concatenates the operation state indicators of the corresponding time positions in the short-term system state time series, medium-term system state time series and long-term system state time series according to a unified time index. The concatenation is then mapped to a preset feature dimension through a linear mapping layer and added to the position encoding vector to obtain the embedding sequence. The Transformer encoding module performs context feature encoding on the embedded sequence and outputs a context representation. The Transformer encoding module includes a multi-layer Transformer encoder, each encoding layer consisting of a multi-head self-attention sub-layer and a feedforward network sub-layer, all employing residual connections and normalization processing. A runtime context encoding mechanism is also introduced, specifically: The system operation status data sequence selects the operation status fields related to the computer system operation environment and load conditions to form the operation status time series, and inputs them into an independently set fully connected linear mapping layer for dimension mapping. After applying the GELU nonlinear activation function to each element of the mapping result, normalization is performed to obtain the operation status encoding vector. In each Transformer coding layer's multi-head self-attention sub-layer, the running condition coding vector is linearly mapped to generate the running condition modulation vector, and element-wise multiplication is performed on the feature dimension with the attention output calculated by multi-head self-attention. In each feedforward sublayer of the Transformer coding layer, the operating condition coding vector is generated into an operating condition gating vector through a linear mapping, and element-wise multiplication is performed with the intermediate representation of the feedforward network on the feature dimension. The Transformer decoding module includes a masked multi-head self-attention sublayer, an encoder-decoder cross-attention sublayer, and a feedforward network sublayer. It uses the context representation as context information, selectively focuses on the encoded representation through the cross-attention mechanism, and uses a combination of linear transformation, GELU nonlinear activation function and linear transformation in the feedforward network sublayer to perform nonlinear mapping on the decoded representation to generate a baseline temporal implicit representation. The output prediction module maps the implicit representation of the baseline time series to a specific healthy operation baseline prediction result. The output prediction module introduces a system state component decoupling prediction mechanism, specifically: The baseline temporal implicit representation is input into multiple parallel prediction mapping channels. Each prediction mapping channel corresponds to a type of system operating status indicator. In each prediction mapping channel, the baseline temporal implicit representation is linearly transformed through an independently configured fully connected linear mapping layer. The GELU nonlinear activation function is applied to each element of the linear transformation result to obtain the intermediate prediction representation of the corresponding operating status indicator. Scaling or exponential constraint processing is performed on each intermediate prediction representation. The prediction results of each operating status indicator are reorganized according to the original correspondence of the operating status indicators to generate multi-indicator healthy operating baseline prediction values. The multi-indicator healthy operating baseline prediction values ​​constitute the computer system healthy operating baseline state sequence in the time dimension.

[0009] Optionally, step four specifically includes: The system operation status data sequence is mapped one-to-one with the healthy operation baseline status sequence according to a unified time index. The deviation of the operation status index from the healthy operation baseline status at each time point is calculated to form a status deviation sequence. The state deviation sequence is segmented according to a preset continuous time window, and the state deviation magnitude within each time window is arranged in chronological order. For each time window, analyze the trend of the state deviation magnitude within the window, specifically to determine whether the state deviation magnitude shows a continuous decrease, remains stable, or continuously increases over time. When the magnitude of the state deviation shows a decreasing trend from large to small within a continuous time window or remains stable within a preset deviation range, the corresponding state deviation is judged to have recoverable characteristics. When the deviation of the state continues to increase within a continuous time window or does not show a downward trend, the corresponding state deviation is judged to have low recoverability characteristics. Based on the judgment results, a corresponding recoverability feature label is generated for each operational status indicator.

[0010] Optionally, step five specifically includes: State deviation data with low recoverability characteristics are selected from the state deviation sequence and classified according to the category of operating status indicators to form a set of low recoverable state deviations. Using the operating cycle as the unit of time analysis, the set of low recoverability state deviations is segmented according to continuous operating cycles, and the frequency, duration and magnitude of the corresponding state deviations in each operating cycle are statistically analyzed. For the same operating status indicator, the corresponding status deviation statistics are accumulated over multiple consecutive operating cycles to form a cumulative deviation sequence; Based on the cumulative deviation sequence, the cumulative deviation features of different operating status indicators are normalized, and the cumulative deviation features corresponding to each operating status indicator are concatenated in the order of the indicators to construct an anomaly evidence vector containing cumulative deviation information of multiple operating status indicators.

[0011] Optionally, step six specifically includes: Based on the abnormal evidence vector, the cumulative deviation characteristics of each operational status indicator are analyzed dimension by dimension to determine whether each operational status indicator shows a continuous accumulation without decline within a continuous operating cycle. When the cumulative deviation features corresponding to at least two or more operational status indicators in the abnormal evidence vector simultaneously meet the continuous accumulation condition within the same operational cycle interval, it is determined that the system operational status has multidimensional state deviation synchronous accumulation features. Based on the determination that there are multidimensional state deviations from synchronous cumulative characteristics, the changes of the abnormal evidence vector within the operating cycle are tracked and analyzed to determine whether the cumulative deviation characteristics continue to be maintained or further expanded within the preset observation period. When the abnormal evidence vector does not show an overall downward trend within the preset observation period, and the multidimensional state deviation from the synchronous accumulation feature continues to exist, it is determined that the corresponding system operation abnormality has exceeded the self-regulation capability of the computer system, resulting in a risk irreversibility judgment result. Based on the assessment of risk irreversibility, the risk level of the system's operating status is classified. The risk level and its corresponding system operating status identifier are output as auxiliary diagnostic results for the computer's operating status.

[0012] Optionally, step seven specifically includes: Based on the auxiliary diagnostic results and risk level, the diagnostic information corresponding to the computer system's operating status is associated with the timestamp to form a structured diagnostic result data record. Based on different risk levels, the auxiliary diagnostic results are processed in a graded manner. When the diagnostic result corresponds to a controllable risk state, the diagnostic result is output as an operational status prompt. When the diagnostic result corresponds to a high-risk irreversible state, the diagnostic result is output as an early warning. Push the operation status prompts or warnings to the operation and maintenance management system or the visualization interface, and display them in association with the corresponding operation status indicators and abnormal evidence vectors; After the warning information is output, the system operation status data, healthy operation baseline status sequence and abnormal evidence vector corresponding to the warning trigger time are recorded to form a historical diagnostic record.

[0013] The beneficial effects of this invention are: This invention constructs a deep learning-based computer-aided diagnostic method, effectively overcoming the problems of existing technologies that rely on single indicators, fixed thresholds, or lack of operational condition differentiation in computer system operation status analysis. By performing time alignment and multi-timescale processing on multi-source operation status data, it can simultaneously characterize short-term fluctuations, medium-term changes, and long-term performance evolution characteristics during system operation, improving the overall perception capability of complex operation states. An operation condition context encoding mechanism based on the original TimeGPT model is introduced, allowing operation condition information to explicitly participate in the time-series modeling process during the model encoding stage, generating a healthy operation baseline state sequence that matches the current operating environment, thereby significantly reducing the risk of false alarms caused by load changes or environment switching. Through a multi-indicator joint baseline modeling mechanism, collaborative modeling of multi-source operation status indicators in a unified model is achieved, avoiding baseline drift caused by independent prediction of single indicators and improving the stability and consistency of the healthy baseline. Simultaneously, the system state component decoupling prediction mechanism reduces mutual interference between different indicators, improving the accuracy and interpretability of baseline prediction results. By further combining the recoverability analysis of state deviations, the abnormal evolution modeling across operating cycles, and the determination of risk irreversibility, this invention can identify potential persistent risks and output early warning information in advance, providing reliable support for operation and maintenance decisions and significantly improving the practicality and intelligence level of computer-aided diagnosis. Attached Figure Description

[0014] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings: Figure 1 This is a flowchart of a computer-aided diagnostic method based on deep learning proposed in this invention; Figure 2 This is a schematic diagram of a computer-aided diagnostic method based on deep learning proposed in this invention; Figure 3This is a framework diagram of the improved TimeGPT model in a deep learning-based computer-aided diagnostic method proposed in this invention. Detailed Implementation

[0015] The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.

[0016] refer to Figure 1-3 A computer-aided diagnostic method based on deep learning includes the following steps: Step 1: Collect multi-source operational status data generated during the operation of the computer system, and timestamp and align the data to form a system operational status data sequence; Step 2: Perform multi-time-scale processing on the system operating state data sequence to form a multi-scale system state sequence set; Step 3: Input the multi-scale system state sequence set into the improved TimeGPT model. The improved TimeGPT model generates a computer system health operation baseline state sequence that matches the current operating conditions by introducing an operating condition context encoding mechanism, a multi-index joint baseline modeling mechanism and a system state component decoupling prediction mechanism. Step 4: Align and compare the system operating status data sequence with the healthy operating baseline status sequence, calculate the deviation magnitude, and extract the recoverability features of the status deviation based on the regression behavior of the deviation magnitude within a continuous time window; Step 5: Perform cumulative evolution analysis across operating cycles on state deviations with low recoverability characteristics, and construct anomaly evidence vectors based on the cumulative intensity and persistence of state deviations in the time dimension; Step 6: Based on the abnormal evidence vector, determine the irreversibility of risk, whether the abnormal operation of the system exceeds the self-regulation capability of the computer system, and under the condition that the multi-dimensional state deviation from synchronization accumulates and cannot be recovered, generate auxiliary diagnostic results of the computer's operating status and determine the corresponding risk level. Step 7: Based on the auxiliary diagnostic results, issue early warnings on the computer system's operating status and provide support for operation and maintenance decisions.

[0017] In this embodiment, the multi-source operating status data is a set of status data used to characterize the operating status of the computer system, including processor operating status data, memory and storage operating status data, network communication operating status data, and system load operating status data.

[0018] In this embodiment, step two specifically includes: The system operation status data sequence is arranged sequentially according to the preset sampling period, with the data points collected within a single sampling period as the basic time unit; Based on basic time units, multiple consecutive basic time units are spliced ​​together according to the first preset time window length to construct a short-term system state time series. The short-term system state time series consists of the original values ​​of each operating state indicator within multiple consecutive time units. Based on the short-term system state time series, multiple consecutive short-term system state time series are aggregated according to the second preset time window length. The aggregation process includes calculating the average value and statistically analyzing the change of each operating state indicator within the corresponding time window to construct a medium-term system state time series. Based on the intermediate system state time series, multiple consecutive intermediate system state time series are summarized according to the length of the third preset time window. The summary process includes performing trend change statistics and cumulative change calculation on each operating state indicator within the corresponding time window to construct a long-term system state time series. The short-term, medium-term, and long-term system state time series are aligned in chronological order, and the corresponding system state data at different time scales are combined using the same time index to form a multi-scale system state time series set containing multi-time scale features.

[0019] This implementation method processes the system operation status data sequence at multiple time scales to construct short-term, medium-term, and long-term system status time series, which can simultaneously reflect instantaneous fluctuations, phased changes, and long-term performance evolution trends. This avoids misjudgments caused by relying on only a single time scale, improves the ability to characterize complex operational status changes, and enhances the stability and accuracy of computer-aided diagnostic results.

[0020] In this embodiment, the improved TimeGPT model includes an input embedding module, a Transformer encoding module, a Transformer decoding module, and an output prediction module: The input embedding module performs embedding representation processing on the multi-scale system state time series set, introduces a multi-index joint baseline modeling mechanism, and concatenates the operation state indicators of the corresponding time positions in the short-term system state time series, medium-term system state time series and long-term system state time series according to a unified time index. The concatenation is then mapped to a preset feature dimension through a linear mapping layer and added to the position encoding vector to obtain the embedding sequence. The Transformer encoding module performs context feature encoding on the embedded sequence and outputs a context representation. The Transformer encoding module includes a multi-layer Transformer encoder, each encoding layer consisting of a multi-head self-attention sub-layer and a feedforward network sub-layer, all employing residual connections and normalization processing. A runtime context encoding mechanism is also introduced, specifically: The system operation status data sequence selects the operation status fields related to the computer system operation environment and load conditions to form the operation status time series, and inputs them into an independently set fully connected linear mapping layer for dimension mapping. After applying the GELU nonlinear activation function to each element of the mapping result, normalization is performed to obtain the operation status encoding vector. In each Transformer coding layer's multi-head self-attention sub-layer, the running condition coding vector is linearly mapped to generate a running condition modulation vector, and element-wise multiplication is performed on the feature dimension with the attention output calculated by multi-head self-attention, so as to realize conditional modulation of the attention response intensity for different time steps and different running state indicators. In each feedforward sublayer of the Transformer encoding layer, the operating condition encoding vector is linearly mapped to generate an operating condition gating vector, and element-wise multiplication is performed with the intermediate representation of the feedforward network in the feature dimension, so that the context representation output by the Transformer encoding layer can be adaptively adjusted as the operating condition changes. By introducing the operating condition context coding mechanism and the multi-indicator joint baseline modeling mechanism, the multi-source operating condition indicators are formed into a joint temporal representation under operating condition constraints during the Transformer coding stage, thereby completing the multi-indicator joint baseline modeling during the coding stage. The Transformer decoding module includes a masked multi-head self-attention sublayer, an encoder-decoder cross-attention sublayer, and a feedforward network sublayer. It uses the context representation as context information, selectively focuses on the encoded representation through the cross-attention mechanism, and uses a combination of linear transformation, GELU nonlinear activation function and linear transformation in the feedforward network sublayer to perform nonlinear mapping on the decoded representation to generate a baseline temporal implicit representation. The output prediction module maps the implicit representation of the baseline time series to a specific healthy operation baseline prediction result. The output prediction module introduces a system state component decoupling prediction mechanism, specifically: The baseline temporal implicit representation is input into multiple parallel prediction mapping channels. Each prediction mapping channel corresponds to a type of system operating status indicator. In each prediction mapping channel, the baseline temporal implicit representation is linearly transformed through an independently configured fully connected linear mapping layer. The GELU nonlinear activation function is applied to each element of the linear transformation result to obtain the intermediate prediction representation of the corresponding operating status indicator. Scaling or exponential constraint processing is performed on each intermediate prediction representation to adjust the numerical scale of the prediction results of different operating status indicators and reduce the mutual coupling influence in the multi-indicator prediction process. The prediction results of each operating status indicator are reorganized according to the original correspondence of the operating status indicators to generate multi-indicator healthy operating baseline prediction values. The multi-indicator healthy operating baseline prediction values ​​constitute the computer system healthy operating baseline state sequence in the time dimension.

[0021] This implementation model, while maintaining the original TimeGPT model's Transformer encoder-decoder structure, introduces a runtime context encoding mechanism, a multi-indicator joint baseline modeling mechanism, and a system state component decoupling prediction mechanism to model the healthy operating baseline of a computer system. Using the TimeGPT model as the base model effectively characterizes long-term dependencies and multi-scale evolutionary features in computer operating states, making it suitable for time-series modeling of multi-source operating state data. Compared to existing computer-aided diagnostic methods based on fixed thresholds, single-indicator prediction, or simple residual judgment, this implementation explicitly encodes runtime information and injects it into the Transformer encoding module, conditionally modulating the self-attention calculation and feature transformation processes. This allows the generated healthy operating baseline to adaptively adjust with changes in system load and operating environment, thereby reducing the risk of false alarms caused by changes in operating conditions. Simultaneously, the multi-indicator joint baseline modeling mechanism enables collaborative modeling of multiple operating state indicators within a unified model, improving the consistency and stability of baseline predictions. The system state component decoupling prediction mechanism reduces mutual interference between different indicators, improving the accuracy and interpretability of the healthy operating baseline and providing a more reliable foundation for subsequent operating state deviation analysis and auxiliary diagnosis.

[0022] In this embodiment, step four specifically includes: The system operation status data sequence is mapped one-to-one with the healthy operation baseline status sequence according to a unified time index. The deviation of the operation status index from the healthy operation baseline status at each time point is calculated to form a status deviation sequence. The state deviation sequence is segmented according to a preset continuous time window, and the state deviation magnitude within each time window is arranged in chronological order. For each time window, analyze the trend of the state deviation magnitude within the window, specifically to determine whether the state deviation magnitude shows a continuous decrease, remains stable, or continuously increases over time. When the magnitude of the state deviation shows a decreasing trend from large to small within a continuous time window or remains stable within a preset deviation range, the corresponding state deviation is judged to have recoverable characteristics. When the deviation of the state continues to increase within a continuous time window or does not show a downward trend, the corresponding state deviation is judged to have low recoverability characteristics. Based on the judgment results, a corresponding recoverability feature label is generated for each operational status indicator.

[0023] This implementation method aligns and compares the system's operating status with the healthy operating baseline, and extracts recoverable features based on the deviation trend within a continuous time window. This allows it to distinguish between transient fluctuations and persistent anomalies, avoiding misjudging short-term disturbances as system failures. This improves the robustness of the state deviation analysis and provides a more reliable basis for subsequent anomaly evolution analysis.

[0024] In this embodiment, step five specifically includes: State deviation data with low recoverability characteristics are selected from the state deviation sequence and classified according to the category of operating status indicators to form a set of low recoverable state deviations. Using the operating cycle as the unit of time analysis, the set of low recoverability state deviations is segmented according to continuous operating cycles, and the frequency, duration and magnitude of the corresponding state deviations in each operating cycle are statistically analyzed. For the same operational status indicator, the corresponding status deviation statistics are accumulated over multiple consecutive operational cycles to form a cumulative deviation sequence characterizing the deviation evolution process of the operational status indicator. Based on the cumulative deviation sequence, the cumulative deviation features of different operating status indicators are normalized to ensure that the cumulative deviation features of each operating status indicator are within a uniform numerical scale range. The cumulative deviation features corresponding to each operating status indicator are then concatenated according to the indicator order to construct an anomaly evidence vector containing cumulative deviation information of multiple operating status indicators.

[0025] This implementation method constructs anomaly evidence vectors by performing cumulative evolution analysis on deviations from low recoverability states across operating cycles. This enables a quantitative characterization of the risk of anomalies evolving from local deviations to continuous evolution, avoiding diagnostic judgments based solely on single anomalies. It helps to identify potential systemic risks in advance and improves the ability of computer-aided diagnosis to identify long-term abnormal trends.

[0026] In this embodiment, step six specifically includes: Based on the abnormal evidence vector, the cumulative deviation characteristics of each operational status indicator are analyzed dimension by dimension to determine whether each operational status indicator shows a continuous accumulation without decline within a continuous operating cycle. When the cumulative deviation features corresponding to at least two or more operational status indicators in the abnormal evidence vector simultaneously meet the continuous accumulation condition within the same operational cycle interval, it is determined that the system operational status has multidimensional state deviation synchronous accumulation features. Based on the determination that there are multidimensional state deviations from synchronous cumulative characteristics, the changes of the abnormal evidence vector within the operating cycle are tracked and analyzed to determine whether the cumulative deviation characteristics continue to be maintained or further expanded within the preset observation period. When the abnormal evidence vector does not show an overall downward trend within the preset observation period, and the multidimensional state deviation from the synchronous accumulation feature continues to exist, it is determined that the corresponding system operation abnormality has exceeded the self-regulation capability of the computer system, resulting in a risk irreversibility judgment result. Based on the risk irreversibility determination results, the system operation status is classified into risk levels, including controllable risk status and high-risk irreversible status. The risk level and its corresponding system operating status identifier are output as auxiliary diagnostic results for the computer's operating status.

[0027] This implementation method uses multidimensional state deviation synchronization accumulation analysis and risk irreversibility determination based on abnormal evidence vectors to identify operational anomalies that exceed the system's self-regulation capabilities and classify the degree of risk, avoiding the problem of maintenance personnel being unable to judge the severity of anomalies, and improving the objectivity of risk assessment and the credibility of diagnostic results.

[0028] In this embodiment, step seven specifically includes: Based on the auxiliary diagnostic results and risk level, the diagnostic information corresponding to the computer system's operating status is associated with the timestamp to form a structured diagnostic result data record. Based on different risk levels, the auxiliary diagnostic results are processed in a graded manner. When the diagnostic result corresponds to a controllable risk state, the diagnostic result is output as an operational status prompt. When the diagnostic result corresponds to a high-risk irreversible state, the diagnostic result is output as an early warning. The system pushes operational status prompts or warnings to the operation and maintenance management system or visualization interface, and displays them in association with the corresponding operational status indicators and abnormal evidence vectors to support operation and maintenance personnel in analyzing the process of changes in system operational status. After the warning information is output, the system operation status data, healthy operation baseline status sequence and abnormal evidence vector corresponding to the warning trigger time are recorded to form a historical diagnostic record.

[0029] This implementation method achieves timely early warning and visualization of the computer system's operating status by structurally recording and hierarchically outputting auxiliary diagnostic results and risk levels. It also provides support information for operation and maintenance decisions by combining historical diagnostic records, thereby improving the response efficiency and scientific nature of operation and maintenance personnel to system operation risks and enhancing the practical application value of computer-aided diagnosis.

[0030] Example 1: To verify the feasibility of this invention in practice, it was applied to a scenario of intelligent monitoring and assisted diagnosis of the computer system operation status in a large cloud computing data center. This data center has approximately 300 physical servers, each configured with a 32-core processor, 128GB of memory, and solid-state storage. It supports various services including online computing, databases, and network services. The system operates under high load and dynamic scheduling for extended periods, with frequent changes in operating conditions, making it a typical engineering example.

[0031] In practical applications, computer system operating status data is collected with a uniform sampling period of 60 seconds. Data from different sources is timestamped and time-aligned to form a system operating status data sequence. The collected operating status data includes indicators such as average processor utilization, memory usage, disk I / O latency, network bandwidth utilization, network packet loss rate, and overall system load index. The value range of each indicator is consistent with the actual operating environment to accurately reflect the system operating status.

[0032] In the data preprocessing stage, the system operating state data sequence is processed at multiple time scales. By setting three time scale windows of 10 minutes, 1 hour, and 24 hours, short-term, medium-term, and long-term system state time series are constructed respectively, enabling the system to simultaneously characterize the instantaneous fluctuations, phased changes, and long-term performance evolution trends of its operating state, forming a multi-scale system state sequence set.

[0033] Subsequently, the multi-scale system state sequence set is input into the improved TimeGPT model for healthy operation baseline modeling. This model, based on the original Transformer encoder-decoder architecture, introduces a runtime context encoding mechanism, allowing system load and resource occupancy information to participate in the encoding process as contextual conditions. Simultaneously, a multi-index joint baseline modeling mechanism is used to collaboratively model multi-source runtime state indicators such as processor, memory, storage, and network, and a system state component decoupling prediction mechanism is used to generate healthy operation baseline prediction results for each runtime state indicator.

[0034] After obtaining the healthy operating baseline state sequence, the system operating status data sequence is aligned and compared with the healthy operating baseline state sequence. The deviation magnitude is calculated and its changing trend within a continuous time window is analyzed to distinguish between recoverable deviations and low-recoverable deviations. Low-recoverable deviations are further subjected to cumulative evolution analysis across operating cycles to construct anomaly evidence vectors. Based on these anomaly evidence vectors, the irreversibility of risk is determined, generating auxiliary diagnostic results and risk levels for system operating status early warning and maintenance decision support.

[0035] To comprehensively evaluate the practical effectiveness of the method of this invention, it was compared and analyzed with fixed threshold alarm methods and single-index time series prediction methods under the same dataset and operating environment. Data center operating data from 30 consecutive days was selected as the test sample, including typical operating scenarios such as multiple load surges, resource contention, and network fluctuations.

[0036] Table 1. Comparison of different methods in computer system operational status diagnosis

[0037] As shown in Table 1, the fixed threshold alarm method, due to its use of static thresholds for judgment, cannot adapt to dynamic changes in system operating conditions. It frequently triggers false alarms during load switching and short-term fluctuations, and its ability to identify slowly evolving anomalies is weak, resulting in a high false negative rate. The single-index prediction method reduces the false alarm rate to some extent, but because it only models a single operating status index and ignores the coupling relationship between multiple indices, it still suffers from identification lag when anomalies manifest as coordinated deviations across multiple indices.

[0038] In comparison, the method of this invention significantly reduces both the false alarm rate and the false negative rate. By employing a runtime context encoding mechanism, the model can distinguish between normal load changes and abnormal state changes, enabling the healthy operating baseline to adaptively adjust with operating conditions, fundamentally reducing misjudgments caused by changes in operating conditions. Simultaneously, the multi-indicator joint baseline modeling mechanism allows multiple operating status indicators to be collaboratively modeled within a unified model, enabling earlier identification when anomalies occur in the form of deviations from multiple indicators.

[0039] Regarding the early risk identification time, the average early warning time of the method of this invention reaches 4.2 hours, which is significantly better than the comparative methods. Further analysis revealed that in multiple system resource contention scenarios, the method of this invention can provide early risk warnings when processor utilization, memory usage, and disk I / O wait time all show low recoverability deviations and continue to accumulate, while the comparative methods often trigger alarms only after system performance has already significantly degraded.

[0040] The diagnostic stability score reflects the consistency of diagnostic results from different methods over a continuous operating cycle. The method of this invention, through anomaly evolution analysis across operating cycles, avoids the problem of repeated alarms triggered by a single anomaly, making the diagnostic results more stable over time and significantly improving the diagnostic stability score.

[0041] This embodiment demonstrates that the computer-aided diagnostic method based on deep learning has significant technical advantages in complex operating environments. By using operating condition context encoding, multi-indicator joint baseline modeling, and decoupled prediction of system state components, the healthy operating baseline becomes more stable and accurate. Through recoverability analysis and anomaly evolution modeling, it achieves proactive identification of system operational risks. Through risk irreversibility determination and a tiered early warning mechanism, it provides maintenance personnel with more reliable and instructive auxiliary diagnostic results, exhibiting good engineering feasibility and practical application value.

[0042] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

Claims

1. A computer-aided diagnostic method based on deep learning, characterized in that, Includes the following steps: Step 1: Collect multi-source operational status data generated during the operation of the computer system, and timestamp and align the data to form a system operational status data sequence; Step 2: Perform multi-time-scale processing on the system operating state data sequence to form a multi-scale system state sequence set; Step 3: Input the multi-scale system state sequence set into the improved TimeGPT model. The improved TimeGPT model generates a computer system health operation baseline state sequence that matches the current operating conditions by introducing an operating condition context encoding mechanism, a multi-index joint baseline modeling mechanism and a system state component decoupling prediction mechanism. Step 4: Align and compare the system operating status data sequence with the healthy operating baseline status sequence, calculate the deviation magnitude, and extract the recoverability features of the status deviation based on the regression behavior of the deviation magnitude within a continuous time window; Step 5: Perform cumulative evolution analysis across operating cycles on state deviations with low recoverability characteristics, and construct anomaly evidence vectors based on the cumulative intensity and persistence of state deviations in the time dimension; Step 6: Based on the abnormal evidence vector, determine the irreversibility of risk, whether the abnormal operation of the system exceeds the self-regulation capability of the computer system, and under the condition that the multi-dimensional state deviation from synchronization accumulates and cannot be recovered, generate auxiliary diagnostic results of the computer's operating status and determine the corresponding risk level. Step 7: Based on the auxiliary diagnostic results, issue early warnings on the computer system's operating status and provide support for operation and maintenance decisions.

2. The computer-aided diagnostic method based on deep learning according to claim 1, characterized in that, The multi-source operational status data is a set of status data used to characterize the operational status of a computer system, including processor operational status data, memory and storage operational status data, network communication operational status data, and system load operational status data.

3. The computer-aided diagnostic method based on deep learning according to claim 1, characterized in that, Step two specifically includes: The system operation status data sequence is arranged sequentially according to the preset sampling period, with the data points collected within a single sampling period as the basic time unit; Based on basic time units, multiple consecutive basic time units are spliced ​​together according to the first preset time window length to construct a short-term system state time series. Based on the short-term system state time series, multiple consecutive short-term system state time series are aggregated according to the second preset time window length to construct the medium-term system state time series. Based on the intermediate system state time series, the long-term system state time series are constructed by summarizing multiple consecutive intermediate system state time series according to the third preset time window length. The short-term, medium-term, and long-term system state time series are aligned in chronological order, and the corresponding system state data at different time scales are combined using the same time index to form a multi-scale system state time series set containing multi-time scale features.

4. The computer-aided diagnostic method based on deep learning according to claim 1, characterized in that, The improved TimeGPT model includes an input embedding module, a Transformer encoding module, a Transformer decoding module, and an output prediction module: The input embedding module performs embedding representation processing on the multi-scale system state time series set, introduces a multi-index joint baseline modeling mechanism, and concatenates the operation state indicators of the corresponding time positions in the short-term system state time series, medium-term system state time series and long-term system state time series according to a unified time index. The concatenation is then mapped to a preset feature dimension through a linear mapping layer and added to the position encoding vector to obtain the embedding sequence. The Transformer encoding module performs context feature encoding on the embedded sequence and outputs a context representation. The Transformer encoding module includes a multi-layer Transformer encoder, each encoding layer consisting of a multi-head self-attention sub-layer and a feedforward network sub-layer, all employing residual connections and normalization processing. A runtime context encoding mechanism is also introduced, specifically: The system operation status data sequence selects the operation status fields related to the computer system operation environment and load conditions to form the operation status time series, and inputs them into an independently set fully connected linear mapping layer for dimension mapping. After applying the GELU nonlinear activation function to each element of the mapping result, normalization is performed to obtain the operation status encoding vector. In each Transformer coding layer's multi-head self-attention sub-layer, the running condition coding vector is linearly mapped to generate the running condition modulation vector, and element-wise multiplication is performed on the feature dimension with the attention output calculated by multi-head self-attention. In each feedforward sublayer of the Transformer coding layer, the operating condition coding vector is generated into an operating condition gating vector through a linear mapping, and element-wise multiplication is performed with the intermediate representation of the feedforward network on the feature dimension. The Transformer decoding module includes a masked multi-head self-attention sublayer, an encoder-decoder cross-attention sublayer, and a feedforward network sublayer. It uses the context representation as context information, selectively focuses on the encoded representation through the cross-attention mechanism, and uses a combination of linear transformation, GELU nonlinear activation function and linear transformation in the feedforward network sublayer to perform nonlinear mapping on the decoded representation to generate a baseline temporal implicit representation. The output prediction module maps the implicit representation of the baseline time series to a specific healthy operation baseline prediction result. The output prediction module introduces a system state component decoupling prediction mechanism, specifically: The baseline temporal implicit representation is input into multiple parallel prediction mapping channels. Each prediction mapping channel corresponds to a type of system operating status indicator. In each prediction mapping channel, the baseline temporal implicit representation is linearly transformed through an independently configured fully connected linear mapping layer. The GELU nonlinear activation function is applied to each element of the linear transformation result to obtain the intermediate prediction representation of the corresponding operating status indicator. Scaling or exponential constraint processing is performed on each intermediate prediction representation. The prediction results of each operating status indicator are reorganized according to the original correspondence of the operating status indicators to generate multi-indicator healthy operating baseline prediction values. The multi-indicator healthy operating baseline prediction values ​​constitute the computer system healthy operating baseline state sequence in the time dimension.

5. The computer-aided diagnostic method based on deep learning according to claim 1, characterized in that, Step four specifically includes: The system operation status data sequence is mapped one-to-one with the healthy operation baseline status sequence according to a unified time index. The deviation of the operation status index from the healthy operation baseline status at each time point is calculated to form a status deviation sequence. The state deviation sequence is segmented according to a preset continuous time window, and the state deviation magnitude within each time window is arranged in chronological order. For each time window, analyze the trend of the state deviation magnitude within the window, specifically to determine whether the state deviation magnitude shows a continuous decrease, remains stable, or continuously increases over time. When the magnitude of the state deviation shows a decreasing trend from large to small within a continuous time window or remains stable within a preset deviation range, the corresponding state deviation is judged to have recoverable characteristics. When the deviation of the state continues to increase within a continuous time window or does not show a downward trend, the corresponding state deviation is judged to have low recoverability characteristics. Based on the judgment results, a corresponding recoverability feature label is generated for each operational status indicator.

6. The computer-aided diagnostic method based on deep learning according to claim 1, characterized in that, Step five specifically includes: State deviation data with low recoverability characteristics are selected from the state deviation sequence and classified according to the category of operating status indicators to form a set of low recoverable state deviations. Using the operating cycle as the unit of time analysis, the set of low recoverability state deviations is segmented according to continuous operating cycles, and the frequency, duration and magnitude of the corresponding state deviations in each operating cycle are statistically analyzed. For the same operating status indicator, the corresponding status deviation statistics are accumulated over multiple consecutive operating cycles to form a cumulative deviation sequence; Based on the cumulative deviation sequence, the cumulative deviation features of different operating status indicators are normalized, and the cumulative deviation features corresponding to each operating status indicator are concatenated in the order of the indicators to construct an anomaly evidence vector containing cumulative deviation information of multiple operating status indicators.

7. The computer-aided diagnostic method based on deep learning according to claim 1, characterized in that, Step six specifically includes: Based on the abnormal evidence vector, the cumulative deviation characteristics of each operational status indicator are analyzed dimension by dimension to determine whether each operational status indicator shows a continuous accumulation without decline within a continuous operating cycle. When the cumulative deviation features corresponding to at least two or more operational status indicators in the abnormal evidence vector simultaneously meet the continuous accumulation condition within the same operational cycle interval, it is determined that the system operational status has multidimensional state deviation synchronous accumulation features. Based on the determination that there are multidimensional state deviations from synchronous cumulative characteristics, the changes of the abnormal evidence vector within the operating cycle are tracked and analyzed to determine whether the cumulative deviation characteristics continue to be maintained or further expanded within the preset observation period. When the abnormal evidence vector does not show an overall downward trend within the preset observation period, and the multidimensional state deviation from the synchronous accumulation feature continues to exist, it is determined that the corresponding system operation abnormality has exceeded the self-regulation capability of the computer system, resulting in a risk irreversibility judgment result. Based on the assessment of risk irreversibility, the risk level of the system's operating status is classified. The risk level and its corresponding system operating status identifier are output as auxiliary diagnostic results for the computer's operating status.

8. The computer-aided diagnostic method based on deep learning according to claim 1, characterized in that, Step seven specifically includes: Based on the auxiliary diagnostic results and risk level, the diagnostic information corresponding to the computer system's operating status is associated with the timestamp to form a structured diagnostic result data record. Based on different risk levels, the auxiliary diagnostic results are processed in a graded manner. When the diagnostic result corresponds to a controllable risk state, the diagnostic result is output as an operational status prompt. When the diagnostic result corresponds to a high-risk irreversible state, the diagnostic result is output as an early warning. Push the operation status prompts or warnings to the operation and maintenance management system or the visualization interface, and display them in association with the corresponding operation status indicators and abnormal evidence vectors; After the warning information is output, the system operation status data, healthy operation baseline status sequence and abnormal evidence vector corresponding to the warning trigger time are recorded to form a historical diagnostic record.