A data analysis method, device and electronic equipment

By using automated data analysis methods, the problem of low BMC fault analysis rate under large-scale stress testing was solved, and efficient parsing and fault location of BMC crash test data were achieved, thereby improving fault repair efficiency.

CN118938857BActive Publication Date: 2026-06-26NETTRIX INFORMATION IND CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NETTRIX INFORMATION IND CO LTD
Filing Date
2024-07-23
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In large-scale stress testing scenarios, the number of BMC failures is high and the amount of crash memory information is large. The speed of manual analysis of crash memory information is low, resulting in resource waste and low analysis efficiency.

Method used

By using automated data analysis methods, multiple crash test datasets from BMCs are obtained. Target parsing rules are selected based on data hierarchy for parsing, and each crash test dataset is iteratively analyzed to improve the analysis speed.

Benefits of technology

It enables automated analysis of a large amount of crash test data generated by BMC failures, improving the analysis speed and making it easier for maintenance personnel to locate and fix faults in a timely manner.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118938857B_ABST
    Figure CN118938857B_ABST
Patent Text Reader

Abstract

The application provides a data analysis method and device and electronic equipment, and relates to the technical field of data processing.In the application, first, a crash test data set for stress testing of multiple measured baseboard management controllers (BMC) is acquired; then, based on a data level to which each crash test data in the crash test data set belongs, a target analysis rule corresponding to the data level is selected from an analysis rule set, and each crash test data is analyzed based on the target analysis rule to obtain each target analysis data; finally, each target analysis data is iteratively analyzed, and an analysis result corresponding to each target analysis data is output. In this way, a large amount of crash test data generated by BMC failure can be automatically collected, and the analysis rate of the crash test data is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing technology, and in particular to a data analysis method, apparatus and electronic device. Background Technology

[0002] The Baseboard Management Controller (BMC) is a dedicated controller used to monitor and manage servers. When the server's BMC fails (e.g., due to errors in the BMC's kernel or application code), it provides maintenance personnel with alarm information and corresponding recovery methods.

[0003] Specifically, during startup and operation, when the BMC kernel code contains errors or crashes, a kernel panic mechanism is triggered. This mechanism captures the crash memory information caused by the kernel failure and stores it in a Kdump file. When the BMC application layer code contains errors, it saves the crash memory information (such as memory layout, variable values, call stack information, and CPU register values) caused by the application code errors in a coredump file. Maintenance personnel can analyze the cause of the application crash based on the coredump file and thus repair the corresponding fault.

[0004] Currently, crash memory information caused by BMC application code errors is typically analyzed manually. However, in large-scale stress testing scenarios, where the number of failed BMCs is high and the amount of crash memory information generated by BMC application code errors is substantial, manually analyzing this information would slow down the analysis process. Furthermore, since a large amount of information in the crash memory information generated by application code errors is repetitive, manually analyzing each instance would waste human resources. Summary of the Invention

[0005] This invention application provides a data analysis method, apparatus, and electronic device to improve the speed of analyzing large amounts of crash memory information generated by BMC faults. The specific technical solution is as follows:

[0006] Firstly, this application provides a data analysis method, including:

[0007] Obtain crash test datasets for stress testing multiple board management controllers (BMCs) under test;

[0008] Based on the data level to which each crash test data belongs in the crash test dataset, a target parsing rule corresponding to the data level is selected from the parsing rule set, and each crash test data is parsed based on the target parsing rule to obtain each target parsing data;

[0009] The analysis of each target parsing data is performed iteratively, and the analysis results corresponding to each target parsing data are output.

[0010] Based on the above method, it is possible to collect a large amount of crash test data generated by the failure of the tested BMC under stress test scenarios, and to perform more detailed and automated analysis on the large amount of crash test data, thereby improving the analysis speed of crash test data.

[0011] In one possible implementation, obtaining the crash test dataset for stress testing multiple tested BMCs includes:

[0012] Access any one of the multiple target BMCs under test;

[0013] Determine whether crash memory information caused by kernel failure exists in the first preset directory of the target BMC;

[0014] If so, the crash memory information is transferred to the crash test dataset;

[0015] If not, the core dump file in the second preset directory of the target BMC is transferred to the crash test dataset.

[0016] Based on the above method, it is possible to collect crash memory information caused by kernel failures in the target BMC.

[0017] In one possible implementation, transferring the core dump file from the second preset directory of the target BMC to the crash test dataset includes:

[0018] Determine whether the core dump file exists in the second preset directory of the target BMC;

[0019] If so, the core dump file is transferred to the crash test dataset;

[0020] If not, then access the next target BMC among the plurality of tested BMCs, until all BMCs among the plurality of tested BMCs have been polled.

[0021] Based on the above method, it is possible to collect core dump files generated by application code errors in the target BMC and store the crash memory information and core dump files in the crash test dataset.

[0022] In one possible implementation, if the crash test data belongs to the kernel layer, then the crash test data is parsed using the following target parsing rules:

[0023] Determine the firmware version of the target BMC that generated the crash test data;

[0024] Determine whether a target kernel image version corresponding to the firmware version exists in a preset kernel image version set, wherein the kernel image version in the kernel image version set represents the uncompressed version of the kernel compiled for the tested BMC;

[0025] If so, obtain the image version time of the target kernel image version and the firmware version time of the crash test data. When it is determined that the image version time and the firmware version time are the same, call the kernel dump file analysis tool to parse the crash test data.

[0026] If not, output the first prompt message.

[0027] Based on the above method, automated parsing of kernel dump files can be achieved.

[0028] In one possible implementation, if the crash test data belongs to the application layer, then the crash test data is parsed using the following target parsing rules:

[0029] Determine the firmware version of the target BMC that generated the crash test data.

[0030] Determine whether a target symbol table corresponding to the firmware version exists in the symbol table set, wherein the symbol table set is generated when compiling the firmware of the plurality of tested BMCs, and each firmware version of the tested BMC corresponds to a symbol table version.

[0031] If so, then obtain the crash indication information from the crash test data, and parse the crash test data based on the crash indication information;

[0032] If not, a prompt message will be displayed.

[0033] Based on the above method, automated parsing of core dump files can be achieved.

[0034] In one possible implementation, the crash indication information includes a crash process name or a crash thread name. If the crash thread name is obtained from the crash test data, then parsing the crash test data based on the crash indication information includes:

[0035] Find the crash process name associated with the crashed thread name in the thread statistics table;

[0036] Based on the crash process name, the program debugging tool GDB is invoked to parse the crash test data.

[0037] Based on the above method, automated parsing of core dump files can be achieved.

[0038] In one possible implementation, after outputting the analysis results corresponding to each of the target parsed data, the method further includes:

[0039] The analysis results corresponding to each of the target parsing data are pushed to the quality monitoring terminal; or

[0040] Based on the operational data of the tested BMC, a comprehensive analysis report is generated and pushed to the quality monitoring terminal or maintenance personnel, so that the maintenance personnel can repair the abnormal BMC based on the comprehensive analysis report. The operational data includes at least the memory information, CPU status information and restart information of the tested BMC.

[0041] By pushing crash test data to the quality monitoring end or operations and maintenance personnel, operations and maintenance personnel can more intuitively determine whether the fault is a kernel fault or an application layer code fault, locate the fault in a timely manner, and repair the faulty server.

[0042] Secondly, this application provides a data analysis apparatus, comprising:

[0043] The data acquisition module is used to acquire crash test datasets from stress tests on multiple baseboard management controllers (BMCs) under test.

[0044] The data parsing module is used to select the target parsing rule corresponding to the data level from the parsing rule set based on the data level to which each crash test data belongs in the crash test dataset, and parse each crash test data based on the target parsing rule to obtain each target parsing data;

[0045] The data analysis module is used to perform iterative analysis on each of the target parsed data and output the analysis results corresponding to each of the target parsed data.

[0046] Thirdly, this application provides an electronic device, comprising:

[0047] Memory, used to store computer programs;

[0048] When a processor executes a computer program stored in the memory, it implements the steps of the data analysis method described above.

[0049] Fourthly, this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the above-described data analysis method.

[0050] For the various aspects of the second to fourth aspects mentioned above, and the technical effects that each aspect may achieve, please refer to the above description of the technical effects that can be achieved for the first aspect or the various possible solutions in the first aspect, which will not be repeated here. Attached Figure Description

[0051] Figure 1 A flowchart illustrating a data analysis method provided in an embodiment of this application;

[0052] Figure 2 This is a schematic diagram of the data analysis system architecture provided in the embodiments of this application;

[0053] Figure 3 A flowchart for obtaining a crash test dataset is provided in an embodiment of this application;

[0054] Figure 4 This is a schematic diagram illustrating the storage of crash test data provided in an embodiment of this application;

[0055] Figure 5 A flowchart for parsing a kernel dump file is provided as an embodiment of this application;

[0056] Figure 6 A flowchart for parsing a core dump file is provided as an embodiment of this application;

[0057] Figure 7 A flowchart for analyzing target parsing data is provided in an embodiment of this application;

[0058] Figure 8 This is a schematic diagram illustrating the analysis results pushed to the quality monitoring terminal according to an embodiment of this application;

[0059] Figure 9 This is a schematic diagram of the structure of a data analysis device provided in an embodiment of this application;

[0060] Figure 10 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0061] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings. The specific operational methods in the method embodiments can also be applied to the device embodiments or system embodiments. It should be noted that in the description of this application, "multiple" is understood as "at least two". "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. A connected to B can represent: A and B directly connected, and A and B connected through C. Furthermore, in the description of this application, terms such as "first" and "second" are used only for distinguishing the purpose of description and should not be construed as indicating or implying relative importance or order.

[0062] The embodiments of this application will now be described in detail with reference to the accompanying drawings.

[0063] The Baseboard Management Controller (BMC) is a dedicated controller used to monitor and manage servers. When a server's BMC fails (e.g., due to errors in the BMC's kernel or application code), it provides maintenance personnel with alarm information and corresponding recovery methods.

[0064] Specifically, during BMC startup and operation, when errors occur in the BMC kernel code, such as accessing a null pointer, deadlock, or accessing an illegal address, a kernel panic mechanism is triggered. This mechanism captures the crash memory information resulting from the kernel failure and stores it in a Kdump file. When an error occurs in a running program at the BMC application layer, such as executing an invalid memory reference or receiving a segmentation fault signal (SIGSEGV), the system saves the crash memory information (e.g., memory layout, variable values, call stack information, and CPU register values) in a coredump file. Maintenance personnel can analyze the coredump file to analyze the cause of the application crash, thereby debugging and repairing the corresponding faults.

[0065] Currently, crash memory information caused by application code errors is typically analyzed manually. However, in large-scale stress testing scenarios, where the number of BMCs that fail is high and the amount of crash memory information generated by BMC application code errors is substantial, continuing to analyze this information manually will slow down the analysis process. Furthermore, since there is a large amount of repetitive information in the crash memory information generated by application code errors, manually analyzing each piece of information will also waste human resources.

[0066] In view of this, in order to comprehensively analyze the large amount of crash memory information generated by BMC failures under stress testing scenarios and improve the analysis speed of crash memory information, this application provides a data analysis method, which specifically includes: first, obtaining a crash test dataset of stress tests on multiple board-under-test management controllers (BMCs); then, based on the data level to which each crash test data belongs in the crash test dataset, selecting the target parsing rule corresponding to the data level from the parsing rule set, and parsing each crash test data according to the target parsing rule to obtain each target parsing data; finally, performing iterative analysis on each target parsing data and outputting the analysis results corresponding to each target parsing data.

[0067] The method provided in this application can automatically collect a large amount of crash test data generated by BMC failures, and select the corresponding target parsing rule according to the data level to which each crash test data belongs to, and obtain the corresponding analysis results, thereby improving the analysis speed of crash test data and making it easier for operation and maintenance personnel to locate fault problems.

[0068] Reference Figure 1 The diagram shown is a flowchart of a data analysis method provided in an embodiment of this application. The method includes:

[0069] S1, Obtain the crash test dataset for stress testing multiple board management controllers (BMCs) under test.

[0070] Firstly, the method provided in this application can be applied to Figure 2 The system architecture shown includes a log analysis server, a BMC under test, a compilation server, and a terminal device, and the method provided in this application can run on the log analysis server.

[0071] This application does not impose specific restrictions on the types or quantities of the aforementioned servers. A brief description of the functions of each server is provided below:

[0072] The log analysis server is used to collect crash memory information generated during stress testing of the BMC under test, and parse the crash memory information according to the set parsing rules. After the parsing is completed, the analysis results corresponding to the crash memory information are sent to the terminal device.

[0073] The BMC under test is used to run the test version firmware compiled by the compilation server. When the BMC under test fails, the resulting crash memory information is stored in a designated directory. To enable testing of a large number of test firmware versions, this application allows for the creation of multiple BMCs under test (e.g., BMC1-BMC3), with each BMC updated with a different test firmware version. Furthermore, the BMC under test can be deployed on the server under test; this application does not impose specific restrictions on the deployment method of the BMC under test.

[0074] The compilation server is used to automatically compile and generate test version firmware. When generating each test version firmware, it retains the symbol table information corresponding to the current test version firmware. The symbol table information is the debugging information of the executable program collected by the compilation server when compiling the test version firmware, and the symbol table information can be used to parse crash memory information (dump file). In this application, the compilation server can generate test version firmware according to a set time period. The time period can be set according to the actual testing requirements, and this application does not impose specific restrictions on it.

[0075] The terminal device can be a portable terminal such as a tablet or mobile phone. The terminal device can receive the analysis results from the log analysis server and display the analysis results to the operation and maintenance personnel so that they can locate the fault based on the analysis results.

[0076] In the embodiments of this application, as shown in the appendix Figure 3 As shown, before obtaining the crash test dataset for stress testing multiple baseboard management controllers (BMCs) under test, the log analysis server first compiles the corresponding test version firmware and retains the symbol table information of each test version firmware. Then, it reads the configuration file of the BMC under test and updates each test version firmware to the corresponding BMC under test. In specific implementation, a test version firmware can be updated for each BMC under test based on the device identifier of the BMC under test, for example, BMC1-Firmware 1, BMC2-Firmware 2. Finally, stress testing is performed on each BMC under test. During stress testing, if the kernel of the tested BMC fails, the tested BMC will trigger a kernel panic mechanism, storing the crash memory information of the kernel failure in a kernel dump file (Kdump file) in a specified directory of the tested BMC. If the application layer code of the tested BMC has an error, the crash memory information generated by the application failure will be stored in a core dump file (coredump file) in a specified directory of the tested BMC. At this time, in the specified directory of each tested BMC that has failed, the crash memory information related to the test version firmware of that tested BMC is stored.

[0077] In this embodiment of the application, after stress testing is completed on multiple BMCs under test, the log analysis server can obtain the crash test dataset of stress testing on multiple BMCs under test.

[0078] Specifically, the log analysis server connects to any one of the multiple tested BMCs, for example, target BMC1. In this application, the log analysis server can first determine the device identifier of the tested BMC, such as the IP address of the tested BMC, and then determine the target BMC to be connected based on the IP address of the tested BMC. After successfully connecting to the target BMC, the log analysis server can enter the target BMC's directory and then determine whether there is crash memory information caused by a kernel failure in the first preset directory of the target BMC. If it exists, it indicates that the kernel code of the target BMC has failed. At this time, the log analysis server will transfer the crash memory information to the crash test dataset. If it does not exist, the log analysis server will transfer the kernel dump file in the second preset directory of the target BMC to the crash test dataset. In this application, when determining whether there is crash memory information caused by a kernel failure in the first preset directory of the target BMC, specifically, it can first determine whether there is a directory named "_mem_" in the target directory, and determine whether there is crash memory information caused by a kernel failure based on the directory named "_mem_".

[0079] In this embodiment, when the log analysis server transfers the core dump file in the second preset directory of the target BMC to the crash test dataset, the log analysis server first determines whether the core dump file exists in the second preset directory of the target BMC. If it exists, it indicates that the application code of the target BMC has failed. At this time, the log analysis server will transfer the core dump file to the crash test dataset. If it does not exist, the log analysis server will access the next target BMC among multiple BMCs under test according to the device identifier of the BMC under test, until the kernel dump file and core dump file generated by each BMC under test are collected to obtain the crash test dataset.

[0080] Using the above method, each BMC under test can be accessed based on its device identifier, and the kernel dump file and core dump file generated by each BMC under test can be transferred to the crash test dataset.

[0081] In one possible implementation of this application embodiment, after obtaining the crash test dataset Daily Test_202x_x, the log analysis server can classify and store the crash test data. Specifically, see [link to relevant documentation]. Figure 4As shown, the log analysis server can first classify the crash test data according to the test time, storing crash test data with the same test time in the storage unit of the log analysis server or in the log storage server. For example, crash test data with the same test time can be stored in the log storage server directory 202x_xx_xx. Then, the crash test data can be classified according to the IP address of the tested BMC, storing crash test data belonging to the same IP address (tested BMC) in the directory BMC_IP. Finally, according to the data level to which the crash test data belongs or (the data type of the crash test data), crash test data belonging to the kernel layer (kernel dump files) can be stored in the log storage server directory kdump, and crash test data belonging to the application layer (core dump files) can be stored in the log storage server directory corefiles, thus obtaining the classified crash test dataset.

[0082] S2, based on the data level to which each crash test data belongs in the crash test dataset, selects the target parsing rule corresponding to the data level from the parsing rule set, and parses each crash test data based on the target parsing rule to obtain each target parsing data.

[0083] In this embodiment of the application, after the log analysis server obtains the crash test dataset, each crash test data in the crash test dataset has its own corresponding IP address, test firmware version, and data layer (kernel layer, application layer). In order to parse crash test data at different data layers, the log analysis server can select the target parsing rule corresponding to the data layer of each crash test data from the parsing rule set, and use the target parsing rule to parse each crash test data to obtain each target parsed data.

[0084] In this embodiment of the application, when the log analysis server determines that the data level to which the crash test data belongs is the kernel layer (i.e., the crash test data is a kernel dump file), it parses the kernel dump file in the following way:

[0085] The log analysis server first determines the firmware version of the target BMC that generated the crash test data; then it checks whether there is a target kernel image version corresponding to the firmware version in the preset kernel image version set; if so, it obtains the image version time of the target kernel image version and the firmware version time of the crash test data. If the image version time and the firmware version time are the same, it calls the kernel dump file analysis tool to parse the crash test data; otherwise, it outputs the first prompt message.

[0086] Specifically, see Figure 5As shown, after obtaining the crash test dataset, the log analysis server can filter out multiple directories named "BMC_IP" containing kernel dump files. Each directory named "BMC_IP" corresponds to the IP address of the tested BMC. If no directory named "BMC_IP" containing a kernel dump file is found, a prompt message is sent to the terminal device, and the kernel dump file parsing process is exited. For any target directory among the multiple directories named "BMC_IP", the crash test data in the target directory is parsed as follows:

[0087] The log analysis server first determines the firmware version of the target BMC that generated the crash test data or the firmware version of the current IP (target directory); then it checks whether a target kernel image version corresponding to the firmware version of the target BMC exists in a preset kernel image version set. The kernel image versions in the kernel image version set represent the uncompressed version of the kernel compiled for the BMC under test; when matching the target kernel image version, the log analysis server can extract a specific field from the firmware version of the BMC under test. The specific field can be the firmware identifier NH, and the target kernel image version is matched based on the firmware identifier.

[0088] When the log analysis server determines that a target kernel image version corresponding to the firmware version of the target BMC exists in the kernel image version set, it can obtain the image version time of the target kernel image version, enter the target directory, and obtain the crash test data and the firmware version time of the crash test data in the target directory. In addition, after entering the target directory, the log analysis server can also determine whether the files in the directory are empty. If the files are empty, it will traverse the next directory named "BMC_IP" that contains a kernel dump file. Finally, it will determine whether the image version time of the target kernel image version is the same as the firmware version time of the crash test data. If the log analysis server determines that the image version time and the firmware version time are the same, it will call the kernel dump file analysis tool to parse the crash test data.

[0089] In this embodiment of the application, after the log analysis server enters the target directory and obtains the crash test data in the target directory, it also needs to determine whether the crash test data is compressed. If so, the crash test data is decompressed to obtain the firmware version time of the crash test data; if not, the firmware version time of the crash test data can be obtained directly.

[0090] In this embodiment of the application, when the log analysis server determines that the image version time and the firmware version time are different, it sends an indication message to the terminal device to indicate that the image version time and the firmware version time are inconsistent and that the current crash test data parsing has failed.

[0091] When the log analysis server determines that there is no target kernel image version in the kernel image version set that corresponds to the target firmware version, the log analysis server can send a first prompt message to the terminal device that there is no target kernel image version in the kernel image version set that corresponds to the target firmware version.

[0092] Based on the above method of parsing kernel dump files in the target directory, the log analysis server parses the kernel dump files in each target directory of multiple directories named "BMC_IP", and can obtain the parsing results corresponding to each (crash test data) kernel dump file.

[0093] The above method enables automated parsing of a large number of kernel dump files generated by multiple tested BMC kernel failures in the crash test dataset, thereby improving the parsing speed of a large number of kernel dump files.

[0094] In this embodiment of the application, when the log analysis server determines that the data layer to which the crash test data belongs is the application layer (i.e., the crash test data is a core dump file), it parses the core dump file in the following way:

[0095] The log analysis server first determines the firmware version of the target BMC that generated the crash test data; then it checks whether a target symbol table corresponding to the firmware version exists in the pre-acquired symbol table set, wherein the symbol table set is generated when compiling the firmware of multiple tested BMCs, and each tested BMC firmware version corresponds to a symbol table version; if yes, it obtains the crash indication information in the crash test data and parses the crash test data based on the crash indication information; if no, it outputs a second prompt message.

[0096] Specifically, see Figure 6 As shown, after obtaining the crash test dataset, the log analysis server can filter out multiple directories named "BMC_IP" containing core dump files (corefiles directory) from the crash test dataset. Each directory named "BMC_IP" corresponds to the IP address of the tested BMC. If no directory named "BMC_IP" containing a core dump file is found, a prompt message is sent to the terminal device, and the core dump file parsing process is exited. For any target directory among the multiple directories named "BMC_IP", the crash test data in the target directory is parsed as follows:

[0097] The log analysis server first determines the firmware version of the target BMC that generated the crash test data or the firmware version of the current IP (target directory). Then, it checks whether a target symbol table corresponding to the firmware version of the target BMC exists in the pre-acquired symbol table set. The symbol table set is generated when compiling the firmware of multiple tested BMCs, and each tested BMC firmware version corresponds to a symbol table version. When matching the target symbol table, the log analysis server can also extract specific fields from the firmware version of the tested BMC. These specific fields can be the firmware identifier NH, and the target symbol table is matched based on the firmware identifier.

[0098] When the log analysis server determines that there is a target symbol table in the symbol table set that corresponds to (matches) the firmware version of the target BMC, the log analysis server can enter the target directory (corefiles directory), obtain the crash test data and crash indication information of the crash test data in the target directory, and parse the crash test data according to the crash indication information.

[0099] In this embodiment, after entering the target directory, the log analysis server also needs to determine whether the files in the target directory are empty. If so, it parses the next target directory (the next IP address). If not, it determines whether the crash test data is compressed. If compressed, the log analysis server needs to decompress the crash test data to obtain the crash indication information of the crash test data. The crash indication information can be the crash process name or crash thread name of the crash test data, and this application does not impose specific restrictions on it. If it is not compressed, the log analysis server can directly obtain the crash indication information of the crash test data.

[0100] When the log analysis server determines that there is no target symbol table in the symbol table set corresponding to the firmware version of the target BMC, the log analysis server can send a second prompt message to the terminal device indicating that there is no target symbol table in the symbol table set corresponding to the firmware version of the target BMC.

[0101] In this embodiment, after receiving the crash indication information, if the log analysis server obtains the crash process name from the crash test data, it can parse the crash test data using the crash process name and the program debugging tool GNU Debugger (GDB). If the log analysis server obtains the crash thread name from the crash test data, it can first search for the crash process name associated with the crash thread name in the thread statistics table; then, it can parse the crash test data using the crash process name and the program debugging tool GNU Debugger (GDB). It should be noted that, due to the length limit of process names, for processes with particularly duplicate names, the log analysis server can perform a secondary matching. After a successful match, it calls the GDB tool to parse the crash test data according to the configured command file.

[0102] Based on the above method of parsing core dump files in the target directory, the log analysis server parses the core dump files in each target directory of multiple directories named "BMC_IP", and can obtain the parsing results corresponding to each core dump file (crash test data).

[0103] The above method enables automated parsing of a large number of core dump files generated by faults in the application layer program code of multiple tested BMCs in the crash test dataset, thereby improving the parsing speed of a large number of core dump files.

[0104] In this embodiment of the application, the log analysis server parses the kernel dump file and core dump file in the crash test dataset according to their respective target parsing rules, and can obtain each target parsing data.

[0105] S3 performs iterative analysis on the parsed data of each target and outputs the analysis results corresponding to each parsed data of each target.

[0106] In the embodiments of this application, see Figure 7 As shown, when iteratively analyzing the parsed data for each target, the log analysis server can first set up a relevant configuration set for the analysis tool. The configuration parameters of the configuration set can be adjusted according to business needs. For example, when analyzing core dump files, the log dump path for dump file analysis can be set, the GDB tool can be enabled to output the dump, symbol table information can be imported, and stack information can be printed. When analyzing kernel dump files, stack information or fault logs can be printed.

[0107] After accessing the analysis interface using the analysis tools, the log analysis server can execute commands provided by the configuration set, obtain execution results, and preprocess these results. For example, it can filter the results, such as sending log names and filtering out other related information. Then, the preprocessed results are sent to the data handler, awaiting the next analysis command. Furthermore, before sending the preprocessed results to the data handler, they can be optimized, for example, by generating structured information to facilitate the subsequent formation of analysis specifications.

[0108] After receiving the preprocessed execution results, the data processing end of the log analysis server can analyze the results and generate the next parsing command based on the analysis. For example, when parsing a core dump file, the stack information is output to the data processing end. From the stack information, the filename and corresponding line number of the core dump file where the failure occurred are selected, and the corresponding code is searched in the codebase to obtain the variable information contained in that line of code, including the variable name and variable type. This information is then passed to the storage engine (cmd engine).

[0109] After receiving information from the data processing end, the log analysis server's storage engine generates new profiling commands based on that information, stores the commands, and then passes them to the server for iterative analysis. For example, it determines whether to use a command to print variables or a command to print structures based on the variable types passed from the data processing end, concatenates the command with the variable information, and passes it to the server for the next round of analysis.

[0110] The log analysis server iteratively analyzes the parsed data for each target and outputs the analysis results corresponding to each target's parsed data.

[0111] In this embodiment, after generating the analysis results corresponding to each target parsed data, the log analysis server can also push the analysis results corresponding to each target parsed data to the quality monitoring terminal. Users can access the analysis results through a browser, for example, by logging into the quality monitoring webpage, entering the filename of the dump file, and obtaining the analysis results for that dump file. The analysis results include, but are not limited to, call stack information, current register usage, and other relevant information. For details on the analysis results pushed to the quality monitoring terminal, please refer to [link to relevant documentation]. Figure 8 As shown.

[0112] In one possible implementation, the log analysis server can also acquire operational data of the tested BMC, such as memory information, CPU usage, and reboot information. Based on this data, a comprehensive analysis report is generated and pushed to the quality monitoring system or operations and maintenance personnel. Specifically, the log analysis server can send emails or SMS messages to operations and maintenance personnel who have subscribed to the BMC's operational information, enabling them to troubleshoot and repair malfunctioning BMCs based on the comprehensive analysis report.

[0113] In summary, traditional dump file analysis methods are typically performed manually, resulting in low analysis speed and a lack of methods for analyzing kernel dump files. This application stress-tests multiple Baseboard Management Controllers (BMCs) under test, collecting crash test data (kernel dump files and core dump files) generated by kernel or application layer code failures in these BMCs. Based on the data level to which the crash test data belongs, a target parsing rule corresponding to that data level is selected for parsing, yielding target parsed data for each level. Then, each target parsed data is iteratively analyzed. This approach enables more detailed analysis of large amounts of crash test data generated by BMC failures under stress testing scenarios and improves the analysis speed. By pushing the crash test data to quality monitoring terminals or maintenance personnel, maintenance personnel can more intuitively determine whether the failure is a kernel or application layer code failure, promptly locate the problem, and repair the faulty BMC.

[0114] To facilitate understanding of this solution, the overall process of analyzing crash memory information caused by kernel or application layer code errors is briefly described below:

[0115] Perform stress testing on the BMC under test: Compile the test version firmware on the compilation server, retain the symbol table information of the test version firmware, read the configuration file of the BMC under test, update the test version firmware to the BMC under test, and perform stress testing on the BMC under test.

[0116] To obtain a crash test dataset generated by kernel or application layer code errors in multiple tested BMCs: After stress testing is completed, access the target BMC via the device identifier (IP) of the tested BMC. Upon successful access, enter the target BMC's directory and determine whether there is crash memory information due to kernel failure in the first preset directory of the target BMC. If so, forward the crash memory information to the crash test dataset; otherwise, determine whether there is a core dump file in the second preset directory of the target BMC. If a core dump file is found, save the core dump file to the crash test dataset; if no core dump file is found, access the next target BMC among the multiple tested BMCs. After polling each of the multiple tested BMCs, the crash test dataset is obtained.

[0117] The crash test data in the crash test dataset is categorized and stored as follows: First, the crash test data is categorized according to the test time, and crash test data with the same test time is stored in the log storage server directory 202x_xx_xx; then, the crash test data is categorized according to the IP address of the tested BMC, and crash test data belonging to the same IP address (tested BMC) is stored in the directory BMC_IP; finally, according to the data level to which the crash test data belongs, crash test data belonging to the kernel layer (kernel dump files) is stored in the log storage server directory kdump, and crash test data belonging to the application layer (core dump files) is stored in the log storage server directory corefiles, resulting in the categorized crash test dataset.

[0118] Parsing the kernel dump file: From the crash test dataset, select multiple directories named BMC_IP containing kernel dump files. For a target directory among these directories, determine its firmware version (the firmware version of the target BMC that generated the kernel dump file). Check if a target kernel image version exists in the preset kernel image version set that corresponds to the firmware version of the target BMC. If so, obtain the image version time of the target kernel image version, enter the target directory, and obtain the firmware version time of the kernel dump file. If the image version time and firmware version time are the same, call the kernel dump file analysis tool to parse the kernel dump file; otherwise, send a kernel dump file parsing failure message to the terminal device.

[0119] Parsing the core dump file: From the crash test dataset, select multiple directories named BMC_IP containing core dump files. For a target directory among these directories, determine the firmware version of the target directory (determine the firmware version of the target BMC that generated the core dump file). Check if a target symbol table corresponding to the firmware version of the target BMC exists in the preset symbol table set. If so, enter the target directory, retrieve the core dump file and its crash indication information. If the crash indication information is determined to be a crash process name, use the crash process name and call the GDB tool to parse the core dump file. Otherwise, send a second message indicating core dump file parsing failure to the terminal device.

[0120] Iterative analysis of kernel dump files and core dump files is performed as follows: A relevant configuration set is set for the analysis tool; after entering the analysis interface using the tool, the commands provided by the configuration set are executed to obtain the execution results. The execution results are preprocessed and sent to the data handler, which then waits for the next profiling command. The data handler analyzes the execution results and generates the next parsing command based on the analysis results. After receiving the information from the data handler, the storage engine generates a new profiling command based on the transmitted information, stores the command, and transmits it to the server for iterative analysis.

[0121] Pushing analysis results of dump files to the quality monitoring end: The analysis results of dump files are pushed to the quality monitoring end. For users, they can access the analysis results through a browser, for example, by logging into the quality monitoring webpage, entering the file name of the dump file, and obtaining the analysis results of that dump file; or, based on the operating data of the tested BMC, such as the memory information, CPU usage, and reboot info of the tested BMC, a comprehensive analysis report is generated and pushed to the quality monitoring end or maintenance personnel, so that maintenance personnel can repair the abnormal BMC based on the comprehensive analysis report.

[0122] Based on the methods provided in the above embodiments, this application also provides a data analysis device, such as... Figure 9 The diagram shown is a structural schematic of a data analysis device according to an embodiment of this application. The device includes:

[0123] Data acquisition module 901 is used to acquire crash test datasets from stress tests on multiple board management controllers (BMCs) under test.

[0124] The data parsing module 902 is used to select a target parsing rule corresponding to the data level from the parsing rule set based on the data level to which each crash test data belongs in the crash test dataset, and to parse each crash test data based on the target parsing rule to obtain each target parsing data;

[0125] The data analysis module 903 is used to perform iterative analysis on each of the target parsed data and output the analysis results corresponding to each of the target parsed data.

[0126] Based on the same inventive concept, this application also provides an electronic device that can realize the functions of the aforementioned data analysis device. (Refer to...) Figure 10 The electronic device includes:

[0127] At least one processor 1001 and a memory 1002 connected to at least one processor 1001. In this embodiment, the specific connection medium between the processor 1001 and the memory 1002 is not limited. Figure 10 The example shown is the connection between processor 1001 and memory 1002 via bus 1000. Bus 1000 is... Figure 10 The connections between other components are indicated by thick lines and are for illustrative purposes only, not as limiting information. The Bus 1000 can be divided into address bus, data bus, control bus, etc., for ease of representation. Figure 10 The term 1001 is represented by a single thick line, but this does not imply that there is only one bus or one type of bus. Alternatively, the processor 1001 can also be called a controller; there are no restrictions on the name.

[0128] In this embodiment, memory 1002 stores instructions executable by at least one processor 1001. By executing the instructions stored in memory 1002, at least one processor 1001 can perform the data analysis method described above. Processor 1001 can implement... Figure 9 The functions of each module in the device shown.

[0129] The processor 1001 is the control center of the device. It can connect to various parts of the control device through various interfaces and lines. By running or executing instructions stored in memory 1002 and calling data stored in memory 1002, the processor can perform various functions and process data, thereby monitoring the device as a whole.

[0130] In one possible design, processor 1001 may include one or more processing units. Processor 1001 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the modem processor may also not be integrated into processor 1001. In some embodiments, processor 1001 and memory 1002 may be implemented on the same chip; in some embodiments, they may also be implemented on separate chips.

[0131] The processor 1001 can be a general-purpose processor, such as a central processing unit (CPU), digital signal processor, application-specific integrated circuit, field-programmable gate array or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component, capable of implementing or executing the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the data analysis method disclosed in the embodiments of this application can be directly manifested as being executed by a hardware processor, or executed by a combination of hardware and software modules within the processor.

[0132] Memory 1002, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. Memory 1002 may include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type memory, random access memory (RAM), static random access memory (SRAM), programmable read-only memory (PROM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), magnetic storage, magnetic disk, optical disk, etc. Memory 1002 can be any other medium capable of carrying or storing desired program code in the form of instructions or data structures that can be accessed by a computer, but is not limited thereto. In the embodiments of this application, memory 1002 can also be a circuit or any other device capable of implementing storage functions for storing program instructions and / or data.

[0133] By designing and programming the processor 1001, the code corresponding to the data analysis method described in the foregoing embodiments can be embedded into the chip, enabling the chip to execute the code during operation. Figure 1The steps of the data analysis method in the illustrated embodiment are as follows. How to design and program the processor 1001 is a technique well-known to those skilled in the art and will not be described further here.

[0134] Based on the same inventive concept, embodiments of this application also provide a storage medium storing computer instructions that, when executed on a computer, cause the computer to perform the data analysis method described above.

[0135] In some possible implementations, various aspects of the data analysis method provided in this application may also be implemented in the form of a program product, which includes program code that, when the program product is run on a device, causes the control device to perform the steps in the data analysis method according to the various exemplary embodiments of this application described above.

[0136] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0137] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0138] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0139] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0140] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.

Claims

1. A data analysis method, characterized in that, include: Obtain crash test datasets for stress testing multiple board management controllers (BMCs) under test; Based on the data level to which each crash test data belongs in the crash test dataset, a target parsing rule corresponding to the data level is selected from the parsing rule set, and each crash test data is parsed based on the target parsing rule to obtain each target parsed data; wherein, the data level is the kernel layer or the application layer; Each target parsed data is iteratively analyzed, and the analysis result corresponding to each target parsed data is output. The iterative analysis of each target parsed data includes: setting a configuration set for the analysis tool; after entering the analysis interface using the analysis tool, executing the commands provided by the configuration set to obtain execution results; preprocessing the execution results and waiting for the next analysis command; in the data processing end, analyzing the preprocessed execution results and generating the next parsing command based on the obtained analysis results; in the storage engine, generating a new analysis command based on the information transmitted from the data processing end, storing the new analysis command, and transmitting the new analysis command to the server for the next round of analysis.

2. The method as described in claim 1, characterized in that, The process of obtaining the crash test dataset for stress testing multiple BMCs under test includes: Access any one of the multiple target BMCs under test; Determine whether crash memory information caused by kernel failure exists in the first preset directory of the target BMC; If so, the crash memory information is transferred to the crash test dataset; If not, the core dump file in the second preset directory of the target BMC is transferred to the crash test dataset.

3. The method as described in claim 2, characterized in that, The step of transferring the core dump file in the second preset directory of the target BMC to the crash test dataset includes: Determine whether the core dump file exists in the second preset directory of the target BMC; If so, the core dump file is transferred to the crash test dataset; If not, then access the next target BMC among the plurality of tested BMCs, until all BMCs among the plurality of tested BMCs have been polled.

4. The method as described in claim 1, characterized in that, If the crash test data belongs to the kernel layer, then the crash test data is parsed using the following target parsing rules: Determine the firmware version of the target BMC that generated the crash test data; Determine whether there exists a target kernel image version corresponding to the firmware version in a preset kernel image version set, wherein the kernel image version in the kernel image version set represents the uncompressed version of the kernel compiled for the tested BMC; If so, obtain the image version time of the target kernel image version and the firmware version time of the crash test data. When it is determined that the image version time and the firmware version time are the same, call the kernel dump file analysis tool to parse the crash test data. If not, output the first prompt message.

5. The method as described in claim 1, characterized in that, If the data layer to which the crash test data belongs is the application layer, then the crash test data is parsed using the following target parsing rules: Determine the firmware version of the target BMC that generated the crash test data; Determine whether a target symbol table corresponding to the firmware version exists in the symbol table set, wherein the symbol table set is generated when compiling the firmware of multiple tested BMCs, and each tested BMC firmware version corresponds to a symbol table version. If so, then obtain the crash indication information from the crash test data, and parse the crash test data based on the crash indication information; If not, then output the second prompt message.

6. The method as described in claim 5, characterized in that, The crash indication information includes the crash process name or crash thread name. If the crash thread name is obtained from the crash test data, then parsing the crash test data based on the crash indication information includes: Find the crash process name associated with the crashed thread name in the thread statistics table; Based on the crash process name, the program debugging tool GDB is invoked to parse the crash test data.

7. The method as described in claim 1, characterized in that, After outputting the analysis results corresponding to each of the target parsed data, the method further includes: The analysis results corresponding to each of the target parsing data are pushed to the quality monitoring terminal; or Based on the operational data of the tested BMC, a comprehensive analysis report is generated and pushed to the quality monitoring terminal or maintenance personnel, so that the maintenance personnel can repair the abnormal BMC based on the comprehensive analysis report. The operational data includes at least the memory information, CPU status information and restart information of the tested BMC.

8. A data analysis device, characterized in that, include: The data acquisition module is used to acquire crash test datasets from stress tests on multiple baseboard management controllers (BMCs) under test. The data parsing module is used to select a target parsing rule corresponding to the data level to which each crash test data belongs in the crash test dataset from the set of parsing rules, and parse each crash test data according to the target parsing rule to obtain each target parsed data; wherein, the data level is the kernel layer or the application layer; The data analysis module is used to iteratively analyze each of the target parsed data and output the corresponding analysis results for each target parsed data. Specifically, when iterating the analysis of each target parsed data, the data analysis module is used to set a configuration set for the analysis tool. After entering the analysis interface using the analysis tool, the module executes the commands provided by the configuration set, obtains the execution results, preprocesses the execution results, and continues to wait for the next analysis command. In the data processing end, the preprocessed execution results are analyzed, and the next parsing command is generated based on the obtained analysis results. In the storage engine, a new analysis command is generated based on the information transmitted by the data processing end, the new analysis command is stored, and the new analysis command is transmitted to the server for the next round of analysis.

9. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor, when executing a computer program stored in the memory, implements the steps of the method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the method according to any one of claims 1-7.