Database diagnosing apparatus, database diagnosing method, and computer-readable storage medium

By using database diagnostic equipment and methods, database performance issues are analyzed automatically, and problem location and optimization suggestions are provided. This solves the problem of time-consuming manual diagnosis in existing technologies and achieves efficient fault repair.

CN122240368APending Publication Date: 2026-06-19青岛聚看云科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
青岛聚看云科技有限公司
Filing Date
2026-02-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In current technologies, troubleshooting and repairing database performance issues relies on the professional skills of DBAs, which is time-consuming and labor-intensive, making it difficult to meet high availability requirements and lacking intelligent performance insights and risk warning capabilities.

Method used

A database diagnostic device and method are provided, which automatically performs diagnostic analysis based on data indicators and problem localization strategies by acquiring query information and operational information, and generates diagnostic results, including problem localization and optimization suggestions.

Benefits of technology

It enables automated diagnosis and root cause analysis of database performance issues, shortening fault repair time and improving fault handling efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240368A_ABST
    Figure CN122240368A_ABST
Patent Text Reader

Abstract

This disclosure relates to the interdisciplinary field of database operation and maintenance and artificial intelligence in the field of information technology, and particularly to a database diagnostic device, a database diagnostic method, and a computer-readable storage medium. The method includes: acquiring query information indicating that the database has performance problems during a target time period; acquiring operational information of the database during a specified time period; wherein the operational information includes at least storage resource data, log data, and performance data, and the specified time period includes the target time period; determining at least one data indicator and at least one problem localization strategy corresponding to each data indicator based on the operational information; and performing diagnostic analysis based on at least one problem localization strategy, the query information, and the operational information to generate a diagnostic result.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of database operation and maintenance and the intersection of artificial intelligence in the field of information technology, and in particular to a database diagnostic device, a database diagnostic method and a computer-readable storage medium. Background Technology

[0002] In modern, complex computer systems, databases, as core data storage units, carry a large amount of critical business logic. As business scales and access concurrency increases, databases are prone to problems such as slow response times, CPU spikes, deadlocks, and slow SQL queries. Currently, troubleshooting and repairing these issues heavily relies on the professional skills and practical experience of DBAs, requiring manual server logins, execution of multiple diagnostic commands, and analysis of massive amounts of monitoring metrics and logs. This entire process is time-consuming and labor-intensive, resulting in long average fault recovery times and failing to meet high availability requirements.

[0003] Existing monitoring tools (such as Prometheus and Zabbix) and database-native tools (such as Show Proceeds and Experiment) provide isolated, superficial metrics, lacking intelligent systems capable of understanding causal relationships between metrics and conducting in-depth correlation analysis. Meanwhile, performance tuning experience is largely tacit knowledge, difficult to standardize and solidify, leading to uneven skill levels within teams and high training costs for new employees. Current methods are mostly "firefighting" approaches, lacking proactive performance insights and risk warnings based on historical data and operational patterns.

[0004] Therefore, there is an urgent need to build an intelligent database performance diagnostic system to shorten the fault diagnosis cycle, improve the overall fault handling efficiency, and solve the key technical problem of long database performance problem repair time. Summary of the Invention

[0005] To address the aforementioned technical problems, this disclosure provides a database diagnostic device, a database diagnostic method, and a computer-readable storage medium.

[0006] In a first aspect, this disclosure provides a database diagnostic device, comprising: a communicator configured to: acquire query information indicating that the database has performance problems during a target time period; and a controller configured to: acquire operational information of the database during a specified time period; wherein the operational information includes at least storage resource data, log data, and performance data, and the specified time period includes the target time period; based on the operational information, determining at least one data indicator and at least one problem localization strategy corresponding to each data indicator; performing diagnostic analysis based on at least one problem localization strategy, query information, and operational information to generate a diagnostic result; wherein the diagnostic result indicates either problem localization or optimization suggestions.

[0007] Secondly, this disclosure provides a database diagnostic method, comprising: obtaining query information indicating that the database has performance problems during a target time period; obtaining operational information of the database during a specified time period; wherein the operational information includes at least storage resource data, log data, and performance data, and the specified time period includes the target time period; determining at least one data indicator and at least one problem localization strategy corresponding to each data indicator based on the operational information; performing diagnostic analysis based on at least one problem localization strategy, query information, and operational information to generate diagnostic results; wherein the diagnostic results indicate either problem localization or optimization suggestions.

[0008] Thirdly, this disclosure provides a computer-readable storage medium, comprising: storing a computer program on the computer-readable storage medium, the computer program being executed by a controller using a database diagnostic method as provided in any of the second aspects.

[0009] Fourthly, this disclosure provides a computer program product that, when run on a computer, causes the computer to perform a database diagnostic method as provided in any of the second aspects.

[0010] It should be noted that the aforementioned computer instructions may be stored, in whole or in part, on the first computer-readable storage medium. The first computer-readable storage medium may be encapsulated together with the controller of the database diagnostic device, or it may be encapsulated separately from the controller of the database diagnostic device; this disclosure does not impose any limitations on this.

[0011] The descriptions of the second, third, and fourth aspects in this disclosure can be referenced to the detailed description of the first aspect; and the beneficial effects of the descriptions of the second, third, and fourth aspects can be referenced to the analysis of the beneficial effects of the first aspect, which will not be repeated here.

[0012] In this disclosure, the names of the aforementioned database diagnostic devices do not limit the devices or functional modules themselves. In actual implementation, these devices or functional modules may appear under other names. As long as the functions of each device or functional module are similar to those of this disclosure, they fall within the scope of this disclosure and its equivalents.

[0013] These or other aspects of this disclosure will become more readily apparent in the following description.

[0014] The technical solution provided in this disclosure has the following advantages compared with the prior art: The database diagnostic device provided in this disclosure allows users to send query information indicating a performance problem in the database during a target time period via a client when the user determines that the database is experiencing such a problem. The database diagnostic device's communicator then receives this query information. Subsequently, the controller acquires operational information of the database within a specified time period that includes the target time period. This operational information includes at least storage resource data, log data, and performance data, and the specified time period includes the target time period. Based on this operational information, the controller determines at least one data metric and at least one corresponding problem localization strategy. Finally, the controller performs diagnostic analysis based on at least one problem localization strategy, the query information, and the operational information to generate a diagnostic result. In this way, users can determine the location of performance issues and / or optimization suggestions for the database within a target time period based on the diagnostic results. Users can then address these performance issues promptly, reducing the lengthy database performance problem repair time. Furthermore, when diagnosing performance issues within a target time period, users only need to input query information indicating the presence of performance problems; the data diagnostic device can automatically output diagnostic results. This achieves automated diagnosis, root cause analysis, and intelligent decision-making for database performance issues. Therefore, it solves the problem of how to build an intelligent database performance diagnostic system in existing technologies to shorten the fault diagnosis cycle, improve overall fault handling efficiency, and address the issue of long database performance problem repair times. Attached Figure Description

[0015] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.

[0016] To more clearly illustrate the technical solutions in the embodiments of this disclosure or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0017] Figure 1 One of the schematic flowcharts of the database diagnostic method provided in the embodiments of this application; Figure 2 A system architecture diagram of the database diagnostic method provided in the embodiments of this application; Figure 3 A second schematic flowchart illustrating the database diagnostic method provided in this application embodiment; Figure 4The third flowchart illustrating the database diagnostic method provided in this application embodiment; Figure 5 The fourth flowchart illustrating the database diagnostic method provided in this application embodiment; Figure 6 This is a schematic diagram of the structure of the database diagnostic device provided in the embodiments of this application; Figure 7 This is a schematic diagram of a chip system provided in an embodiment of this application. Detailed Implementation

[0018] To better understand the above-mentioned objectives, features, and advantages of this disclosure, the solutions disclosed herein will be further described below. It should be noted that, unless otherwise specified, the embodiments and features described herein can be combined with each other.

[0019] Numerous specific details are set forth in the following description in order to provide a full understanding of this disclosure, but this disclosure may also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only some, and not all, of the embodiments of this disclosure.

[0020] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0021] In this embodiment of the disclosure, AAS refers to the average number of active sessions over a period of time during which the user database instance is running.

[0022] In this embodiment of the disclosure, JSON stands for JavaScript Object Notation, which is designed based on a subset of ECMAScript. It is an open standard file and data exchange format that is easy for humans to read and write, and also easy for machines to parse and generate.

[0023] The time alignment in this embodiment is called Time Alignment, which is the process of unifying the timestamps of multiple time series datasets to the same frequency and reference point.

[0024] Missing compensation in this embodiment is a technique for filling in missing values ​​or invalid readings in the original data. The techniques include one or more of the following: interpolation methods (linear interpolation, spline interpolation), forward / backward filling (copying the most recent valid value), mean substitution (replacing missing terms with the overall mean), and prediction model estimation (Autoregressive Integrated Moving Average (ARIMA) / Long Short-Term Memory (LSTM) prediction of missing segments).

[0025] In this embodiment of the disclosure, denoising is used, which is the operation of removing interference components from the signal to restore the true state.

[0026] The anomaly detection in this embodiment is called Anomaly Detection, which automatically identifies data points that deviate significantly from the expected pattern and exceed the normal range.

[0027] In this embodiment, the abnormal range is called the Anomaly Range, which is a permissible range of fluctuations defined by the upper and lower limits of safety based on historical statistical data. Once the real-time monitoring value exceeds this boundary, an alarm notification mechanism is triggered.

[0028] The baseline in this embodiment is a stable horizontal reference line that reflects a period of time. It is usually represented as a weighted average trajectory or a smooth curve after low-pass filtering, serving as a standard measure of short-term fluctuations.

[0029] The average value in this embodiment is the Mean Value, which is the expected estimate obtained by summing all valid measurement results over the entire time period and dividing by the total number, reflecting the overall central tendency position.

[0030] The maximum value in this embodiment is the Max Value, which is the highest instantaneous reading that has occurred within the scope of the investigation and can be used to evaluate the ultimate load capacity of stress testing.

[0031] Root Cause Analysis (RCA) in this embodiment is a process of tracing the essence of a problem, aiming to identify the root cause of abnormal phenomena rather than staying at the surface symptoms. It delves deeper layer by layer through structured methods (such as the 5 Whys analysis, fishbone diagrams, etc.) to answer "why the problem occurred".

[0032] The diagnostic path in this embodiment is a logical roadmap that guides troubleshooting, answering the question "How to find the problem step by step?". It decomposes complex systems into inspectable modules and designs the inspection order according to priority, such as: network → service process → database connection → log anomalies.

[0033] The solutions in this disclosure are specific measures tailored to the root cause, answering the question "How to completely solve the problem and prevent recurrence?". These include short-term fixes (such as restarting services or expanding resources) and long-term improvements (such as optimizing code or improving monitoring).

[0034] The GPT-4 (Generative Pre-trained Transformer-4) in this embodiment is a fourth-generation generative pre-trained model, which is one of the most advanced and widely used closed-source large models.

[0035] Claude in this embodiment is a conversational AI model.

[0036] In this disclosure, LLaMA (Large Language Model Meta AI) is an open-source large language model series, including LLaMA-2 and LLaMA-3, with parameter sizes ranging from 7B to 70B.

[0037] The database diagnostic device provided in this disclosure can be a server. When the server executes the database diagnostic method provided in this disclosure, it can be the server's processor.

[0038] In the following embodiments, the server described above is used as the execution subject for the database diagnostic method provided in the embodiments of this disclosure, to illustrate the method of the embodiments of this application.

[0039] This application provides a database diagnostic method, such as... Figure 1 As shown, the database diagnostic method may include S11-S14.

[0040] S11. Obtain query information to indicate that the database has performance problems during the target time period.

[0041] In some examples, the database diagnostic method provided in this disclosure is applied to, for example... Figure 2The system architecture shown includes client 1 and server 2. Server 2 includes an interaction layer 2-1, an intelligent coordinator 2-2, an intelligent agent layer 2-3, and a tool execution layer 2-4. The intelligent agent layer 2-3 includes: an execution intelligent agent 2-3-1, a reflexive intelligent agent 2-3-2, and a summarizing intelligent agent 2-3-3. The tool execution layer 2-4 includes a first tool 2-4-1 for collecting average active sessions, a second tool 2-4-2 for collecting database information, a third tool 2-4-3 for collecting slow structured query language information, a fourth tool 2-4-4 for collecting execution plan information, and a fifth tool 2-4-5 for collecting abnormal structured query language information.

[0042] When a user encounters a very slow data response while using data, and a query confirms that the central processing unit (CPU) utilization is consistently above 90%, the user can input "Application response is very slow during the target time period, and the database server CPU utilization is consistently above 90%" on client 1's interface. Client 1 then generates query information indicating a performance issue with the database during that time period based on the user's input. Client 1 then sends this query information to server 2. When server 2's interaction layer 2-1 receives the query information from client 1, it retrieves the database's operational information for the specified time period. Based on this operational information, interaction layer 2-1 determines at least one data metric and at least one corresponding problem localization strategy. Finally, interaction layer 2-1 transmits the at least one problem localization strategy, the query information, and the operational information to the intelligent coordinator 2-2. Both the intelligent coordinator 2-2 and the intelligent agent layer 2-3 perform diagnostic analysis based on at least one problem localization strategy, query information, and operational information to generate diagnostic results. For example, if the input data lacks target data, the intelligent coordinator 2-2 acquires the target data through at least one acquisition tool in the tool execution layer 2-4. Based on the input and target data, both the intelligent coordinator 2-2 and the intelligent agent layer 2-3 determine the diagnostic results. For example, the intelligent coordinator 2-2 performs semantic understanding on the input and target data, converting them into structured diagnostic intents. Based on the diagnostic intents converted by the intelligent coordinator 2-2, the intelligent coordinator 2-2 generates an initial diagnostic plan. The executing intelligent agent 2-3-1 executes each diagnostic task in the initial diagnostic plan and organizes the returned time-series data table, submitting it to the intelligent coordinator 2-2. The reflective intelligent agent 2-3-2 generates preliminary detection results based on all data and data context generated by the executing intelligent agent 2-3-1 during the execution of diagnostic tasks. The reflexive agent 2-3-2 reflects on the preliminary detection results. If it determines that the preliminary detection results lack specified information, it calls the execution agent 2-3-1 to supplement the specified information. Then, based on the preliminary detection results and the specified information, the reflexive agent 2-3-2 generates updated detection results. Next, the reflexive agent 2-3-2 sends the updated detection results to the intelligent coordinator 2-2. Based on the updated detection results output by the reflexive agent 2-3-2, the intelligent coordinator 2-2 determines that a new diagnostic task exists. If the intelligent coordinator 2-2 determines that a new diagnostic task exists, it adds the new diagnostic task to the initial diagnostic plan, resulting in an updated initial diagnostic plan. The execution agent 2-3-1 executes each diagnostic task in the updated initial diagnostic plan, organizes the returned time-series data table, and submits it to the intelligent coordinator 2-2.The reflexive agent 2-3-2 generates preliminary detection results based on all data and data context generated by the executing agent 2-3-1 during the execution of diagnostic tasks. The reflexive agent 2-3-2 reflects on the preliminary detection results and, if it determines that the preliminary results lack specified information, calls the executing agent 2-3-1 to supplement that specified information. Then, the reflexive agent 2-3-2 generates updated detection results based on the preliminary detection results and the specified information. The reflexive agent 2-3-2 then sends the updated detection results to the intelligent coordinator 2-2. Based on the updated detection results output by the reflexive agent 2-3-2, the intelligent coordinator 2-2, if it determines that a new diagnostic task exists, adds the new diagnostic task to the initial diagnostic plan, resulting in an updated initial diagnostic plan. This process repeats until the reflexive agent 2-3-2 determines that the iteration conditions are met, at which point the loop stops. The summary agent 2-3-3 summarizes all the data and data context generated by the executing agent 2-3-1 and the reflective agent 2-3-2 in one or more loops to identify the problem phenomenon, analysis process, problem location, optimization suggestions, and relevant evidence. Then, based on the problem phenomenon, analysis process, problem location, optimization suggestions, and relevant evidence, a diagnostic result is generated.

[0043] In some examples, based on whether the diagnostic results meet expectations according to human feedback, if the database still has problems after the user performs the corresponding operations according to the optimization suggestions in the diagnostic results, the user can supplement the information again based on the query information in the interface of client 1; then, server 2 executes the data diagnostic method provided in this embodiment of the present disclosure again based on the query information containing the supplemented information to obtain the updated diagnostic results, thereby improving the accuracy of the diagnostic results.

[0044] S12. Obtain the database's operational information for a specified time period; wherein the operational information includes at least storage resource data, log data, and performance data, and the specified time period includes the target time period.

[0045] In some examples, storage resource data includes at least one of CPU, input / output (I / O), queries per second (QPS), transactions per second (TPS), connections, latency, memory (MEM), and disk size.

[0046] In some examples, the log data includes at least one of the WARN / ERROR messages.

[0047] In some examples, performance data includes metrics such as slow Structured Query Language (SQL), lock wait, and AAS.

[0048] S13. Based on the operational information, determine at least one data indicator and at least one problem localization strategy corresponding to each data indicator.

[0049] In some examples, when determining at least one data metric and at least one problem localization strategy corresponding to each data metric based on operational information, the operational information can be cleaned to obtain cleaned data; when there are data anomalies in the cleaned data, at least one data metric corresponding to the cleaned data can be obtained; and based on at least one data metric, at least one problem localization strategy corresponding to each data metric can be determined.

[0050] In some examples, data cleaning is performed on the runtime information to obtain cleaned data. This includes performing cleaning operations on the runtime information to obtain cleaned data. The cleaning operations may include one or more of the following: time alignment, missing data compensation, noise reduction, and anomaly detection.

[0051] In some examples, the cleaned data is standardized JSON data containing baseline, average, maximum, and outlier ranges.

[0052] In some examples, when determining at least one data metric based on operational information, the operational information can be cleaned to obtain cleaned data. Then, the value of the specific data metric is calculated based on the cleaned data.

[0053] In some examples, data metrics can be pre-specified, such as average response time, error rate percentage, QPS, TPS, resource utilization (CPU, memory, disk, etc.) or one or more of these metrics; thus, after obtaining the cleaned data, the specific data metric values ​​can be calculated based on the cleaned data.

[0054] In some examples, when determining at least one data metric and at least one problem localization strategy corresponding to each data metric based on operational information, the operational information can be input into an analysis model for analysis to determine at least one data metric and at least one problem localization strategy corresponding to each data metric. The training process of the analysis model includes historical operational information, and the first labeled result includes at least one data metric corresponding to each historical operational information and at least one problem localization strategy corresponding to each data metric.

[0055] Obtain the first training sample data and the first labeling result of the first training sample data; wherein, the first training sample data includes: The first training sample data is input into the first neural network model for learning, and the first prediction result of the first neural network model on the first training sample data is obtained.

[0056] Based on the first prediction result and the first labeling result, the network parameters of the first neural network model are adjusted until the first neural network model converges, thus obtaining the analysis model.

[0057] In some examples, when determining at least one problem localization strategy corresponding to each data indicator based on at least one data indicator, the data indicators can be mapped to knowledge base cases, and the corresponding problem localization strategies can be automatically extracted as inference context. For example, for each data indicator, feature extraction is performed to obtain an actual feature vector. Then, the similarity between the actual feature vector and the typical indicator pattern vector of each knowledge base case is calculated. Mathematical metrics such as cosine similarity, Euclidean distance, or edit distance are used to calculate the distance or similarity coefficient between the actual feature vector and the typical indicator pattern vector. Knowledge base cases with a matching degree greater than a preset threshold are selected based on similarity scores, achieving intelligent matching and retrieval of abnormal patterns and historical faults. Subsequently, root cause analysis, diagnostic paths, and solutions based on knowledge base cases with a matching degree greater than the preset threshold are used to form the inference context. This mechanism can quickly identify known problems corresponding to abnormal patterns, provide structured localization strategies and processing suggestions, significantly improve the efficiency and accuracy of fault diagnosis, and realize a shift from a passive response to a proactive prevention-oriented operation and maintenance model.

[0058] In some examples, the typical indicator pattern vector of a knowledge base case refers to the structured abstraction of multiple data indicators strongly related to the problem in a knowledge base case, transforming them into a set of numerical feature vectors. This set of vectors contains the key indicator features of the knowledge base case when a failure occurs, such as "CPU utilization > 80%" and "memory usage > 85%", which are normalized and encoded to form a fixed-dimensional numerical sequence.

[0059] In some examples, problem localization strategies include at least root cause analysis, diagnostic pathways, and solutions.

[0060] S14. Perform diagnostic analysis based on at least one problem localization strategy, query information, and operational information to generate diagnostic results; wherein the diagnostic results indicate either problem localization or optimization suggestions.

[0061] In some examples, diagnostic analysis is performed based on at least one problem localization strategy, query information, and operational information. When generating diagnostic results, the following steps are taken: if the input data lacks target data, the target data is obtained through at least one data acquisition tool; wherein the input data includes at least one problem localization strategy, query information, and operational information; and the diagnostic results are determined based on the input data and the target data.

[0062] In some examples, when determining the diagnostic result based on input data and target data, semantic understanding can be performed on the input data and target data to convert them into structured diagnostic intents. Based on the diagnostic intents, an initial diagnostic plan is generated. Based on the initial diagnostic plan, a reflective iterative loop is executed until the iteration conditions are met, at which point a diagnostic result is generated. The iteration conditions include any one of the following: the confidence level of the diagnostic result is greater than a confidence threshold, the number of loop iterations is equal to a preset number, and the loop duration is greater than a duration threshold.

[0063] In some examples, the initial diagnostic plan includes one or more of the following: problem symptoms, analysis process, problem localization, optimization recommendations, and relevant evidence.

[0064] In some examples, the data diagnostic method provided in this disclosure uses historical data to train a large language model. The historical data includes at least one set of historical diagnostic plans, historical input data, and historical target data. Thus, the input data and target data can be input into the large language model for semantic understanding to generate an initial diagnostic plan. Then, the executing agent executes each diagnostic task in the initial diagnostic plan, organizes the returned time-series data table, and submits it to the intelligent coordinator 2-2. The reflecting agent 2-3-2 generates preliminary detection results based on all data and data context generated by the executing agent 2-3-1 during the execution of diagnostic tasks. The reflecting agent 2-3-2 reflects on the preliminary detection results and, if it determines that the preliminary detection results lack specified information, calls the executing agent 2-3-1 to supplement the specified information. Then, the reflecting agent 2-3-2 generates updated detection results based on the preliminary detection results and the specified information. Finally, the reflecting agent 2-3-2 sends the updated detection results to the intelligent coordinator 2-2. Based on the updated detection results output by the reflective agent 2-3-2, when a new diagnostic task is identified, the intelligent coordinator 2-2 adds the new diagnostic task to the initial diagnostic plan, resulting in an updated initial diagnostic plan. This process repeats until the reflective agent 2-3-2 determines that the iteration conditions are met, at which point the loop stops. The summarizing agent 2-3-3 summarizes all the data and data context generated by the executing agent 2-3-1 and the reflective agent 2-3-2 in one or more loops, identifying the problem phenomenon, analysis process, problem location, optimization suggestions, and relevant evidence. Finally, based on the problem phenomenon, analysis process, problem location, optimization suggestions, and relevant evidence, a diagnostic result is generated.

[0065] In some examples, the Large Language Model (LLM) can be any of GPT-4, Claude, LLaMA, etc.

[0066] For example, a user inputs through the interface of client 1: "During the target time period, the application response is very slow, and the database server CPU usage is consistently above 90%." Then, based on the user's input, client 1 generates query information indicating a performance problem in the database during the target time period. Client 1 then sends this query information to server 2. When interaction layer 2-1 of server 2 receives the query information from client 1, it obtains the database's operational information for the specified time period. Based on this operational information, interaction layer 2-1 determines at least one data metric and at least one problem localization strategy corresponding to each data metric. Then, interaction layer 2-1 transmits the at least one problem localization strategy, the query information, and the operational information to intelligent coordinator 2-2. If the input data (including at least one data metric and at least one problem localization strategy corresponding to each data metric) lacks target data, intelligent coordinator 2-2 obtains the target data through at least one acquisition tool in tool execution layer 2-4. The intelligent coordinator 2-2 utilizes the understanding capabilities of LLM to transform both input and target data into structured diagnostic intents, and generates an initial, executable diagnostic plan based on these intents. For example, the initial diagnostic plan includes three diagnostic tasks: Task_1, Task_2, and Task_3. Task 1: Get the CPU, memory, and IO usage over the past 15 minutes; Task 2: Obtain the current status of all database sessions and identify active and blocked sessions; Task_3: Retrieve the most recently generated slow query logs.

[0067] Then, we enter the "execution-reflection iterative cycle," which is a dynamic, multi-round exploration process.

[0068] Step A (Execution): Execution agent 2-3-1 receives Task_1, calls get_cpu_info(time1, time2), organizes the returned time series data table, and submits it to intelligent coordinator 2-2. Similarly, execution agent 2-3-1 completes Task_2 and Task_3 in sequence.

[0069] Step B (Reflection): The reflective agent 2-3-2 is activated, receiving all current data and the analysis context. The internal workflow of the reflective agent 2-3-2 is as follows: Data Fusion and Analysis: "CPU utilization highly correlates with the occurrence time of slow queries like 'SELECT * FROM large_table WHERE unindexed_column = ?'." When executing queries like 'SELECT * FROM large_table WHERE unindexed_column = ?', the database engine cannot quickly locate data using indexes and must scan the entire table row by row to match the conditions, a process that consumes significant system resources. As query frequency increases or data volume grows, this resource consumption rises exponentially, leading to persistently high overall system CPU utilization. Therefore, monitoring and analyzing this temporal correlation can effectively identify the specific query statements causing system performance degradation, providing a clear diagnostic direction for database optimization.

[0070] Generate preliminary test results.

[0071] Reflecting on the preliminary test results: Based on the current preliminary test results, reflect on what information is still missing from the diagnostic conclusions in the current preliminary test results, and call upon and execute agent 2-3-1 to supplement it.

[0072] "Suggested new tasks: Task_4: Use SHOW ENGINE INNODB STATUS to check the lock wait chain. Task_5: Perform EXPLAIN ANALYZE on the above slow query."

[0073] Step C (Coordination and Iteration): After receiving the output from the reflective agent 2-3-2, the intelligent coordinator 2-2 evaluates its rationality. If a valuable new task is proposed, the coordinator updates the diagnostic plan (e.g., adding Task_4 and Task_5), and then jumps back to Step A to start a new round of execution-reflection. This process continues until: The reflective agent 2-3-2 states that "based on the existing evidence, it is assumed that the confidence level of H1 exceeds 95% and there are no other reasonable doubts," or that the preset maximum number of iterations (e.g., 5 times) or timeout period (e.g., 3 minutes) has been reached.

[0074] After all tools have been invoked, the summarizing agent 2-3-3 will perform a summary analysis based on the preceding analysis and the results of the tool invocations. The tasks of the summarizing agent 2-3-3 are as follows: The root cause was identified, for example: Based on the above analysis and tool usage results, the cause of "very slow application response and database server CPU usage consistently above 90% during the target time period" was determined to be: the lack of an index on the unindexed_column field of the large_table table led to a high-frequency query performing a full table scan, consuming a large amount of CPU resources, and triggering a cascading failure during peak business hours. This identified root cause was then used for problem localization.

[0075] Present the chain of evidence: What is the chain that led to the current cause, and analyze it.

[0076] The optimization suggestion is: "Immediately add an index to the unindexed_column field of the large_table table. The corresponding SQL is: CREATE INDEX idx_unindexed_col ON large_table (unindexed_column)", which means creating an index named idx_unindexed_col for the database table named large_table, based on the unindexed_column field in the table.

[0077] Generate diagnostic results: The final output includes a diagnostic result containing "problem phenomenon", "analysis process", "problem location", "optimization suggestions" and "related evidence".

[0078] In some examples, the execution process of agent 2-3-1 includes three stages: execution phase one, execution phase two, and execution phase three. Execution phase one, the diagnostic planning stage, includes input parameters, execution operations, and output results. The input parameters include: 1. State: Includes historical diagnostic results, reflective conclusions, and results of tools that have been implemented.

[0079] 2. Business data (“all_data”): includes the database instance name, query time range, etc.

[0080] 3. Knowledge base: An external vector database (used for RAG retrieval).

[0081] Perform operations, including: 1. Information Integration: Extract historical diagnoses, reflections, and tool results from the input "state" and combine them into "planned input".

[0082] 2. Knowledge Retrieval: Using "Planned Input" as the query condition, retrieve relevant professional knowledge fragments from external knowledge bases.

[0083] 3. Generate plan: Send system instructions (including retrieved knowledge) and user questions (i.e., "plan input") to the large language model to generate a structured JSON execution plan in a streaming manner.

[0084] The output includes a structured execution plan (“execution_plan”). This execution plan is a list of tasks that explicitly lists the names of the tools to be invoked (“tool_name”) and the basic parameters (“parameters”). If the plan is an empty list, the process terminates prematurely.

[0085] Phase Two: Tool Execution, including: input parameters, execution operations, and output results. Input parameters include: 1. Execution plan (“execution_plan”): The output of execution phase one.

[0086] 2. Business data (“all_data”): Used to inject runtime parameters.

[0087] 3. Toolset (“DBTools”): An instance of a utility class containing various database operation functions (such as querying status and executing SQL).

[0088] Perform operations, including: 1. Plan parsing and parameter injection: Iterate through each task in "execution_plan" and merge the runtime parameters (instance name, time, etc.) in "all_data" into the parameters of that task.

[0089] 2. Dynamic tool invocation: Based on the "tool_name" in the task, dynamically find the corresponding function from the "DBTools" toolset and execute it, passing in the merged parameters.

[0090] The output includes a "results" dictionary, which records the name of each called tool and its corresponding function return result. All results are appended to the global "tool_use_result" database to form a complete execution history.

[0091] Phase 3 of execution: Status return and transition, including: input parameters, execution operations, and output results. Input parameters include: 1. Tool execution result (“tool_use_result”): Includes the results of the current round and all previous tool executions.

[0092] 2. Execution plan judgment: Based on whether the "execution_plan" generated in the first stage is empty.

[0093] Perform operations, including: 1. State Update: Update the "tool_use_result" generated in this round to the overall "state".

[0094] 2. Process Decision: Determine which node the workflow should proceed to next.

[0095] The output includes: a "Command" object, which contains: 1. "update": The data that needs to be updated in the global "state" (i.e., the "tool_use_result" in this round).

[0096] 2. "goto": Determines the next target node of the workflow engine (if no tool is executed, jump to "post_hook", otherwise continue the subsequent process).

[0097] In some examples, the execution process of the reflexive agent 2-3-2 includes: input parameters, execution operations, and output results; wherein, the input parameters include: 1. State (“state”): Contains historical running data, where the key input is “state.get(“diagnosis”)”, which is the preliminary diagnosis result generated in the previous stage, such as the output result of executing agent 2-3-1.

[0098] 2. Knowledge Base: An external vector database used to provide historical cases and domain knowledge as a reference for reflection.

[0099] Perform operations, including: 1. Knowledge Retrieval: Using the preliminary "diagnostic results" as the query question, relevant cases, solutions, or key points of reflection are retrieved from external knowledge bases to form a "rag_list".

[0100] 2. Construct reflection prompts: Combine system prompts (the injected retrieved knowledge "rag_list" and domain "knowledge") with user messages (i.e., serialized diagnostic results) and send them to the large language model.

[0101] 3. Streaming generation of reflection content: The large language model is called in a streaming manner, allowing it to conduct in-depth "reflection" based on diagnostic results and external knowledge, and output the reflection content ("full_content") in real time.

[0102] 4. Analyze the structured results: Clean and parse the complete text content generated by the model to extract the structured reflection conclusions (“response”).

[0103] The output includes: 1. Original thought record ("full_content"): A complete text response generated by the large language model in a streaming manner.

[0104] 2. Structured Reflection Result (“response”): The structured data obtained after parsing (such as JSON format) contains the model's assessment of this diagnosis, lessons learned, or improvement suggestions. This result is returned via “yield” and updated in the “reflection” field of the status, providing key input for subsequent “executive agent” actions.

[0105] As described above, the database diagnostic method provided in this embodiment allows a user to send query information indicating a performance problem in the database during a target time period to the server via a client when the user determines that the database has a performance problem during that time period. The server then receives the query information indicating the performance problem. Next, the server obtains the database's operational information for a specified time period including the target time period, such as storage resource data, log data, and performance data, where the specified time period includes the target time period. Based on this operational information, the server determines at least one data indicator and at least one problem localization strategy corresponding to each data indicator. Finally, the server performs diagnostic analysis based on at least one problem localization strategy, the query information, and the operational information to generate a diagnostic result. In this way, users can determine the location of performance issues and / or optimization suggestions for the database within a target time period based on the diagnostic results. Users can then address these performance issues promptly, reducing the time required for database performance problem repair. Furthermore, when diagnosing performance issues within a target time period, users only need to input query information indicating the presence of performance problems, and the data diagnostic device can automatically output diagnostic results, achieving automated diagnosis, root cause analysis, and intelligent decision-making for database performance issues.

[0106] In some feasible examples, combining Figure 1 ,like Figure 3 As shown, the above S13 can be implemented by the following S130-S132.

[0107] S130. Perform data cleaning on the operation information to obtain cleaned data; S131. When there are data anomalies in the cleaned data, obtain at least one data indicator corresponding to the cleaned data. S132. Based on at least one data indicator, determine at least one problem localization strategy corresponding to each data indicator.

[0108] As described above, the database diagnostic method provided in this embodiment allows a user to send query information indicating a performance problem in the database during a target time period to the server via a client when the user determines that the database has a performance problem during that time period. The server then receives the query information indicating the performance problem. Next, the server obtains the database's operational information for a specified time period including the target time period, such as storage resource data, log data, and performance data, where the specified time period includes the target time period. The server then cleans the operational information to obtain cleaned data. If the cleaned data contains anomalies, at least one data indicator corresponding to the cleaned data is obtained. Based on the at least one data indicator, at least one problem localization strategy corresponding to each data indicator is determined. Finally, the server performs diagnostic analysis based on at least one problem localization strategy, the query information, and the operational information to generate a diagnostic result. In this way, users can determine the location of performance issues and / or optimization suggestions for the database within a target time period based on the diagnostic results. Users can then address these performance issues promptly, reducing the time required for database performance problem repair. Furthermore, when diagnosing performance issues within a target time period, users only need to input query information indicating the presence of performance problems, and the data diagnostic device can automatically output diagnostic results, achieving automated diagnosis, root cause analysis, and intelligent decision-making for database performance issues.

[0109] In some feasible examples, combining Figure 1 ,like Figure 4 As shown, the above S14 can be implemented by the following S140 and S141.

[0110] S140. In the case where the input data lacks target data, the target data shall be obtained through at least one data acquisition tool; wherein, the input data shall include at least one problem localization strategy, query information and operation information; S141. Determine the diagnostic results based on the input data and target data.

[0111] As described above, the database diagnostic method provided in this embodiment allows a user to send query information indicating a performance problem in the database during a target time period to the server via a client when the user determines that the database has a performance problem during that time period. The server then receives the query information indicating the performance problem. Next, the server obtains the database's operational information for a specified time period including the target time period, such as storage resource data, log data, and performance data, where the specified time period includes the target time period. Based on this operational information, the server determines at least one data metric and at least one problem localization strategy corresponding to each data metric. Then, if the input data (including at least one problem localization strategy, query information, and operational information) lacks target data, the server obtains the target data using at least one data acquisition tool. Finally, based on the input data and the target data, a diagnostic result is determined. In this way, users can determine the location of performance issues and / or optimization suggestions for the database within a target time period based on the diagnostic results. Users can then address these performance issues promptly, reducing the time required for database performance problem repair. Furthermore, when diagnosing performance issues within a target time period, users only need to input query information indicating the presence of performance problems, and the data diagnostic device can automatically output diagnostic results, achieving automated diagnosis, root cause analysis, and intelligent decision-making for database performance issues.

[0112] In some feasible examples, combining Figure 4 ,like Figure 5 As shown, the above S141 can be specifically implemented through the following S1410-S1412.

[0113] S1410. Perform semantic understanding on the input data and target data, and convert the input data and target data into structured diagnostic intent; S1411. Based on the diagnostic intent, generate an initial diagnostic plan; wherein the initial diagnostic plan includes at least one diagnostic task; S1412. Based on the initial diagnostic plan, execute a reflective iterative loop until the iteration conditions are met, and then generate a diagnostic result. The iteration conditions include any one of the following: the confidence level of the diagnostic result is greater than the confidence level threshold, the number of loop iterations is equal to the preset number, and the loop duration is greater than the duration threshold.

[0114] As described above, the database diagnostic method provided in this embodiment allows a user to send query information indicating a performance problem in the database during a target time period to the server via a client when the user determines that the database has a performance problem during that time period. The server then receives the query information indicating the performance problem. Next, the server obtains the database's operational information for a specified time period including the target time period, such as storage resource data, log data, and performance data, where the specified time period includes the target time period. Based on this operational information, the server determines at least one data metric and at least one problem localization strategy corresponding to each data metric. Then, if the input data (including at least one problem localization strategy, query information, and operational information) lacks target data, the server obtains the target data using at least one acquisition tool. The server performs semantic understanding on the input and target data, converting them into a structured diagnostic intent. Based on the diagnostic intent, an initial diagnostic plan is generated. Based on the initial diagnostic plan, a reflective iterative loop is executed until the iteration conditions are met, at which point a diagnostic result is generated. In this way, users can determine the location of performance issues and / or optimization suggestions for the database within a target time period based on the diagnostic results. Users can then address these performance issues promptly, reducing the time required for database performance problem repair. Furthermore, when diagnosing performance issues within a target time period, users only need to input query information indicating the presence of performance problems, and the data diagnostic device can automatically output diagnostic results, achieving automated diagnosis, root cause analysis, and intelligent decision-making for database performance issues.

[0115] In some feasible examples, the collection tools include one or more of the following: a first tool for collecting average active sessions, a second tool for collecting database information, a third tool for collecting slow structured query language information, a fourth tool for collecting execution plan information, and a fifth tool for collecting abnormal structured query language information.

[0116] The foregoing mainly describes the solutions provided by the embodiments of this application from a methodological perspective. To achieve the above functions, it includes corresponding hardware structures and / or software modules for executing each function. Those skilled in the art should readily recognize that, based on the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein, this application can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed in hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0117] This application embodiment can divide the database diagnostic device into functional modules according to the above method example. For example, each function can be divided into its own functional modules, or two or more functions can be integrated into one processing unit. The integrated modules can be implemented in hardware or as software functional modules. It should be noted that the module division in this application embodiment is illustrative and only represents one logical functional division; other division methods may be used in actual implementation.

[0118] like Figure 6 As shown in the diagram, an embodiment of this application provides a schematic diagram of a database diagnostic device. It includes a communicator 101 and a controller 102.

[0119] Communicator 101 is configured to: acquire query information used to indicate that the database has performance problems during a target time period; Controller 102 is configured as follows: Retrieve database operation information for a specified time period; the operation information includes at least storage resource data, log data, and performance data, and the specified time period includes the target time period; Based on operational information, identify at least one data indicator and at least one problem localization strategy corresponding to each data indicator. Diagnostic analysis is performed based on at least one problem localization strategy, query information, and operational information to generate diagnostic results; wherein, the diagnostic results indicate either problem localization or optimization suggestions.

[0120] In some implementable examples, controller 102, when executing the determination of at least one data metric and at least one problem localization strategy corresponding to each data metric based on operational information, is further configured to: The operational information is cleaned to obtain cleaned data; When data anomalies are found in the cleaned data, at least one data indicator corresponding to the cleaned data should be obtained. Based on at least one data indicator, determine at least one problem localization strategy corresponding to each data indicator.

[0121] In some implementable examples, controller 102, when performing diagnostic analysis based on at least one problem localization strategy, query information, and operational information, and generating diagnostic results, is further configured to: In the event that the target data is missing from the input data, the target data shall be obtained through at least one data acquisition tool; wherein, the input data shall include at least one problem localization strategy, query information and operational information; The diagnostic results are determined based on the input and target data.

[0122] In some implementable examples, controller 102, when performing the task of determining diagnostic results based on input and target data, is further configured to: Perform semantic understanding on input and target data, and transform the input and target data into structured diagnostic intent; Based on the diagnostic intent, an initial diagnostic plan is generated; wherein, the initial diagnostic plan includes at least one diagnostic task; Based on the initial diagnostic plan, a reflective iterative loop is executed until the iteration conditions are met, at which point a diagnostic result is generated. The iteration conditions include any one of the following: the confidence level of the diagnostic result is greater than the confidence threshold, the number of iterations is equal to the preset number, and the loop duration is greater than the duration threshold.

[0123] In some feasible examples, the collection tools include one or more of the following: a first tool for collecting average active sessions, a second tool for collecting database information, a third tool for collecting slow structured query language information, a fourth tool for collecting execution plan information, and a fifth tool for collecting abnormal structured query language information.

[0124] All relevant content of each step involved in the above method embodiments can be referenced from the functional description of the corresponding functional module, and their functions will not be repeated here.

[0125] Of course, the database diagnostic device provided in this application embodiment includes, but is not limited to, the modules described above. For example, the database diagnostic device may also include a memory 103. The memory 103 may be used to store the program code of the database diagnostic device, and may also be used to store data generated by the database diagnostic device during operation, such as data in write requests.

[0126] like Figure 7 As shown, this application embodiment also provides a chip system that can be applied to the database diagnostic device in the foregoing embodiments. The chip system includes at least one processor 1501 and at least one interface circuit 1502. The processor 1501 can be the processor in the aforementioned database diagnostic device. The processor 1501 and the interface circuit 1502 can be interconnected via a line. The processor 1501 can receive and execute computer instructions from the memory of the aforementioned database diagnostic device through the interface circuit 1502. When the computer instructions are executed by the processor 1501, the database diagnostic device can perform the various steps performed by the database diagnostic device in the foregoing embodiments. Of course, the chip system may also include other discrete devices, and this application embodiment does not specifically limit this.

[0127] This application also provides a computer-readable storage medium for storing computer instructions for operating the aforementioned database diagnostic device.

[0128] The above description is merely a specific embodiment of this disclosure, enabling those skilled in the art to understand or implement it. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this disclosure. Therefore, this disclosure is not to be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A database diagnostic device, characterized in that, include: The communicator is configured to: acquire query information indicating that the database has performance issues during a target time period; The controller is configured as follows: Obtain the database's operational information over a specified time period; wherein the operational information includes at least storage resource data, log data, and performance data, and the specified time period includes the target time period; Based on the operational information, at least one data indicator is determined, and at least one problem localization strategy is determined for each of the data indicators. Diagnostic analysis is performed based on at least one problem localization strategy, the query information, and the operational information to generate diagnostic results; wherein the diagnostic results indicate either problem localization or optimization suggestions.

2. The database diagnostic device according to claim 1, characterized in that, The controller, when executing the process of determining at least one data metric and at least one problem localization strategy corresponding to each data metric based on the operational information, is further configured to: The operational information is cleaned to obtain cleaned data; When the cleaned data contains data anomalies, at least one data indicator corresponding to the cleaned data is obtained; Based on the at least one data indicator, determine at least one problem localization strategy corresponding to each data indicator.

3. The database diagnostic device according to claim 1, characterized in that, The controller, when performing diagnostic analysis based on at least the at least one problem localization strategy, the query information, and the operational information to generate diagnostic results, is further configured to: In the event that the target data is missing from the input data, the target data is acquired using at least one data acquisition tool; wherein, the input data includes at least: the at least one problem localization strategy, the query information, and the operational information; The diagnostic result is determined based on the input data and the target data.

4. The database diagnostic device according to claim 3, characterized in that, The controller, when performing the task of determining a diagnostic result based on the input data and the target data, is further configured to: The input data and the target data are semantically understood and transformed into structured diagnostic intent. Based on the diagnostic intent, an initial diagnostic plan is generated; wherein the initial diagnostic plan includes at least one diagnostic task; Based on the initial diagnostic plan, a reflective iterative loop is executed until the iteration conditions are met, at which point a diagnostic result is generated; wherein, the iteration conditions include any one of the following: the confidence level of the diagnostic result is greater than a confidence threshold, the number of loop iterations is equal to a preset number, and the loop duration is greater than a duration threshold.

5. The database diagnostic device according to claim 3, characterized in that, The data collection tools include one or more of the following: a first tool for collecting average active sessions, a second tool for collecting database information, a third tool for collecting slow structured query language information, a fourth tool for collecting execution plan information, and a fifth tool for collecting abnormal structured query language information.

6. A database diagnostic method, characterized in that, include: Retrieve query information that indicates performance issues in the database during a target time period; Obtain the database's operational information over a specified time period; wherein the operational information includes at least storage resource data, log data, and performance data, and the specified time period includes the target time period; Based on the operational information, at least one data indicator is determined, and at least one problem localization strategy is determined for each of the data indicators. Diagnostic analysis is performed based on at least one problem localization strategy, the query information, and the operational information to generate diagnostic results; wherein the diagnostic results indicate either problem localization or optimization suggestions.

7. The database diagnostic method according to claim 6, characterized in that, The step of determining at least one data indicator and at least one problem localization strategy corresponding to each data indicator based on the operational information includes: The operational information is cleaned to obtain cleaned data; When the cleaned data contains data anomalies, at least one data indicator corresponding to the cleaned data is obtained; Based on the at least one data indicator, determine at least one problem localization strategy corresponding to each data indicator.

8. The database diagnostic method according to claim 6, characterized in that, The diagnostic analysis, based at least on the at least one problem localization strategy, the query information, and the operational information, generates diagnostic results, including: In the event that the target data is missing from the input data, the target data is acquired using at least one data acquisition tool; wherein, the input data includes at least: the at least one problem localization strategy, the query information, and the operational information; The diagnostic result is determined based on the input data and the target data.

9. The database diagnostic method according to claim 8, characterized in that, Determining the diagnostic result based on the input data and the target data includes: The input data and the target data are semantically understood and transformed into structured diagnostic intent. Based on the diagnostic intent, an initial diagnostic plan is generated; wherein the initial diagnostic plan includes at least one diagnostic task; Based on the initial diagnostic plan, a reflective iterative loop is executed until the iteration conditions are met, at which point a diagnostic result is generated; wherein, the iteration conditions include any one of the following: the confidence level of the diagnostic result is greater than a confidence threshold, the number of loop iterations is equal to a preset number, and the loop duration is greater than a duration threshold.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the database diagnostic method as described in any one of claims 6-9.