System software exception trace method, device, storage medium and program product
By acquiring and hierarchically recording error information of system software anomalies, the problems of inaccurate fault location and low debugging efficiency in existing technologies are solved, enabling remote and accurate location and rapid troubleshooting during software operation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING SMARTCHIP MICROELECTRONICS TECHNOLOGY CO LTD
- Filing Date
- 2026-04-01
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies suffer from inaccurate fault location, lack of hierarchical information, and low debugging efficiency when locating software faults, making it difficult to accurately locate and remotely troubleshoot root source code or logic during software operation.
By acquiring error information at each level when the system software malfunctions, including multiple different types of fields, and based on error codes or error record blocks, the error information at each level is recorded and queried to generate traceability information to determine the location of the malfunction.
It enables precise location of software anomalies, allowing for remote troubleshooting without moving equipment, thus improving the efficiency and accuracy of fault location.
Smart Images

Figure CN121959565B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of software anomaly detection, and in particular to a method, device, storage medium, and program product for tracing system software anomalies. Background Technology
[0002] As digital transformation deepens, the functional boundaries of various industrial control equipment, smart terminals, and embedded hardware continue to expand, and computing power and intelligence levels are constantly improving. The accompanying hardware architecture and software systems are also exhibiting characteristics of hierarchical structure, modularity, and functional density. Software involves multiple layers of logic, including underlying drivers, operating systems, middleware, and applications, with complex cross-layer and cross-module interactions. Simultaneously, equipment deployment is becoming increasingly dispersed. For example, when metering expands to ubiquitous metering, equipment is widely distributed across different regions and scenarios, making troubleshooting equipment failures more challenging.
[0003] When a failure occurs during software operation, it is necessary to determine the root source code location or logical defect causing the problem. Current methods for locating software anomalies include log analysis, debugging tools, static analysis, and dynamic analysis.
[0004] Log analysis methods can be categorized into several types. For example, the backtrace method, which traces the call stack during program runtime, can display the sequence of function calls or record them in flash memory when the program crashes or encounters errors, helping to locate the problem. However, such errors often occur when the program experiences runtime exceptions or crashes, and some function call errors may not be traceable. Other methods return software error codes, but these error codes are often only single-level codes; after multiple calls, only the last level's error code is returned, resulting in the loss of much information. Still other logs are generated through API calls, leading to excessive information being recorded by the system or app, making it difficult to find crucial clues.
[0005] Static and dynamic analysis trace faults at the code level, making them more suitable for early prevention in the software development stage, rather than for tracing the causes of faults during software operation.
[0006] Debugging tools can debug software before and after its application, but this method requires manual troubleshooting, which is inefficient. It also requires personnel to be on-site to test and debug the software. If the equipment used to run the software needs to be recovered and moved, many faults are difficult to reproduce, which is very inconvenient.
[0007] In summary, the existing technology has the following shortcomings:
[0008] 1. Inaccurate fault location and lack of hierarchical information: The software error codes do not reflect the operating system's layered architecture, making it impossible to trace the specific level at which the error occurred and to accurately locate the root source code or logic that caused the exception.
[0009] 2. Low debugging efficiency: It relies on log analysis combined with error codes, requires manual code inspection, which is time-consuming and prone to missing key information. Summary of the Invention
[0010] This invention provides a system software anomaly tracing method, device, storage medium, and program product to solve the above-mentioned problems.
[0011] In a first aspect, embodiments of the present invention provide a system software anomaly tracing method. The system software includes multiple logical levels. The system software anomaly tracing method includes: obtaining error information of each level when the system software is abnormal, wherein each level error information includes multiple fields of different types; based on whether there is an error code in the system software, calling the corresponding error query action, reading each field in each level error information, and obtaining tracing information when the system software is abnormal; and based on the tracing information, determining the location where the system software anomaly occurs.
[0012] The system software anomaly tracing method provided in this embodiment of the invention can record error information at each logical level when software anomalies occur. When tracing anomalies, it can directly find the location and object of the fault at each logical level from the error information at each level. Remote troubleshooting can be performed without moving the device used to run the software, thus achieving accurate location of software faults.
[0013] Optionally, when error codes exist in the system software, the error information at each level includes a level field, an object field, an operation field, and an exception type field; the level field is the level at which the system software encounters an exception, the object field is the object being operated on at the current level when the system software encounters an exception, the operation field is the operation performed when the system software encounters an exception, and the exception type field is the type of exception that occurred in the system software.
[0014] Optionally, the steps of invoking the corresponding error query action based on whether an error code exists in the system software, reading each field in the error information at each level, and obtaining traceability information when the system software is abnormal include: when an error code exists in the system software, reading the error code stored in blocks in the error record block for each occurrence of an abnormality, and obtaining the error code corresponding to each field in the error information at each level; calling the pre-acquired error code encoding table, comparing each error code with the error code encoding table, and determining the error type corresponding to each error code; and generating traceability information for each level when the system software is abnormal based on the error type corresponding to each error code at each level.
[0015] Optionally, the error recording block includes multiple storage blocks, and the error code of each exception that occurs in the system software is stored in the corresponding storage block.
[0016] Optionally, before the step of obtaining error information for each layer when the system software malfunctions, the method further includes: when the system software malfunctions, generating error codes for the hierarchy field, object field, operation field, and exception type field of the current layer where the malfunction occurred; and uploading the error codes of each field of the current layer where the malfunction occurred to the outermost layer of the system software to form error information.
[0017] Optionally, when the system software includes multiple layers, in the process of uploading the error code corresponding to each field of the current layer where the exception occurred to the outermost layer of the system software, for each layer, the error codes corresponding to the layer field, object field, operation field and exception type field of that layer are superimposed until the error codes of all layers where the exception occurred are uploaded to the outermost layer of the system software, forming multi-layer error information.
[0018] Optionally, the hierarchy fields include component layer fields, kernel layer fields, and driver layer fields; the object fields include thread fields, process fields, device fields, file fields, semaphore fields, and mutex fields; the operation fields include open fields, close fields, read fields, write fields, and control fields; and the exception type fields include parameter error fields, permission failure fields, connection failure fields, and interface error fields.
[0019] Optionally, when there is no error code in the system software, the error information at each level when an exception occurs is read from the error record block, which is stored in blocks. The error record block includes multiple storage blocks, and at least one level of error information when an exception occurs in the system software is stored in the corresponding storage block. The error information at each level includes a separator field, the number of storage blocks, a time field, a throwing function field, an exception type field, and integrity verification information. Different storage blocks are separated by a separator field, the time field is the time when the exception occurs in the system software, the throwing function field is the function called when the exception occurs in the system software, and the exception type field is the type of exception that occurred in the system software.
[0020] Optionally, the steps of calling the corresponding error query action based on whether there is an error code in the system software, reading each field in the error information at each level, and obtaining the traceability information when the system software is abnormal include: when there is no error code in the system software, calling the error record block, reading each field of the error information in each storage block in the error record block; and generating traceability information for each level when the system software is abnormal each time based on each field read from each storage block.
[0021] Optionally, before the step of obtaining the error information of each layer when the system software is abnormal, the system software abnormality tracing method further includes: when the system software is abnormal, generating the field corresponding to the error information of each layer with a preset field length; storing the field corresponding to the error information of each layer when the abnormality occurs in a storage block; when the field length of the error information stored in each storage block exceeds a preset length threshold, sending the error information in the corresponding storage block to an external device.
[0022] Optionally, the storage capacity of the storage block is an integer multiple of the preset field length.
[0023] Optionally, the error record block includes multiple storage blocks for storing error information. The system software anomaly tracing method further includes: when error information is obtained, calculating the number of unwritten blocks required to write error information into the storage block, and obtaining the number of written blocks into which error information has been written; based on the relationship between the sum of the number of unwritten blocks and the number of written blocks and the total number of storage blocks, choosing to write error information into each unwritten block sequentially, or to erase the written blocks and write the remaining error information after writing error information into all unwritten blocks.
[0024] Optionally, when writing error information to unwritten blocks, calculate whether the sum of the number of unwritten blocks and the number of written blocks is greater than the total number of storage blocks. If it is less than or equal to the total number of storage blocks, write error information to each unwritten block in sequence. If it is greater than the total number of storage blocks, after all unwritten blocks are filled, start writing the remaining error information from the first storage block until all error information is written.
[0025] Optionally, the error message is in HEX encoding format, and the system software exception tracing method further includes: reading the number of blocks written when the error message was last written in all storage blocks; based on the number of blocks written, searching back through the first delimiter field in the error record block to confirm the number of blocks written, and reading the error message in the storage block where the error message was last written.
[0026] Optionally, the system software anomaly tracing method further includes: after obtaining the error information, using a pre-obtained parsing tool to parse the error information into a String type that includes function information.
[0027] Optionally, the system software anomaly tracing method further includes: in response to a remote interaction command, sending error information in the storage block to the server.
[0028] Optionally, the system software anomaly tracing method also includes: performing integrity verification on error information and / or error codes using cyclic redundancy detection or hash value detection.
[0029] In a second aspect, embodiments of the present invention provide a system software anomaly tracing device, comprising: a processor and a memory, wherein the memory stores instructions; the processor invokes the instructions in the memory to cause the processor to execute the system software anomaly tracing method of any of the foregoing embodiments of the first aspect of the present invention.
[0030] The processor of the system software anomaly tracing device provided in this embodiment of the invention executes the system software anomaly tracing method of any of the foregoing embodiments of the first aspect of the invention by calling instructions in the memory. When an anomaly occurs in the software, it can record error information of each logical level in layers. When tracing the anomaly, it can directly find the location and object of the fault in each logical level from the error information of each layer. Remote troubleshooting can be performed without moving the device used to run the software, and the software fault can be accurately located.
[0031] Thirdly, embodiments of the present invention provide a computer-readable storage medium storing instructions that, when executed by a processor, implement the system software anomaly tracing method of any of the foregoing embodiments of the first aspect of the present invention.
[0032] The instructions stored in the computer-readable storage medium provided in the embodiments of the present invention can be called by a processor and executed by the system software anomaly tracing method of any of the foregoing embodiments of the first aspect of the present invention. When an anomaly occurs in the software, error information of each logical level is recorded in layers. When tracing the anomaly, the location and object of the fault at each logical level can be found directly from the error information of each layer. Remote troubleshooting can be performed without moving the device used to run the software, thereby achieving accurate location of software faults.
[0033] Fourthly, embodiments of the present invention provide a computer program product, which includes a computer program that, when executed by a processor, implements the system software anomaly tracing method of any of the foregoing embodiments of the first aspect of the present invention.
[0034] When the computer program in the computer program product provided in the embodiments of the present invention is executed by the processor, it can implement the system software anomaly tracing method of any of the foregoing embodiments of the first aspect of the present invention. This enables the computer program product to record error information of each logical level in layers when software anomalies occur. When tracing anomalies, it can directly find the location and object of the fault in each logical level from the error information of each layer. Remote troubleshooting can be performed without moving the device used to run the software, thereby achieving accurate location of software faults. Attached Figure Description
[0035] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the structures shown in these drawings without creative effort.
[0036] Figure 1 This is a flowchart of an embodiment of the system software anomaly tracing method of the present invention;
[0037] Figure 2 This is a flowchart illustrating the steps before downloading and obtaining error information in the first case of one embodiment of the system software anomaly tracing method of the present invention;
[0038] Figure 3 This is a schematic diagram of the error code encoding structure in the first case of an embodiment of the system software anomaly tracing method of the present invention;
[0039] Figure 4 This is a diagram showing the first two bytes of error code encoding in the first case of an embodiment of the system software anomaly tracing method of the present invention;
[0040] Figure 5 This is a diagram of the last two bytes of error code encoding in the first case of an embodiment of the system software anomaly tracing method of the present invention;
[0041] Figure 6 This is a flowchart of step S120 in the first case of one embodiment of the system software anomaly tracing method of the present invention;
[0042] Figure 7 This is a flowchart illustrating the steps before downloading and obtaining error information in a second scenario of one embodiment of the system software anomaly tracing method of the present invention;
[0043] Figure 8 This is a flowchart of step S120 in the second case of one embodiment of the system software anomaly tracing method of the present invention;
[0044] Figure 9 This is a schematic diagram of the storage format of the error record block in the second case of one embodiment of the system software anomaly tracing method of the present invention;
[0045] Figure 10 This is a structural block diagram of one embodiment of the system software anomaly tracing device of the present invention. Detailed Implementation
[0046] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.
[0047] It should be noted that all directional indications in the embodiments of the present invention, such as up, down, left, right, front, back, etc., are only used to explain the relative positional relationship and movement of the components in a specific posture as shown in the attached figure. If the specific posture changes, the directional indication will also change accordingly.
[0048] Furthermore, the use of terms such as "first" and "second" in this invention is for descriptive purposes only and should not be construed as indicating or implying their relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined with "first" or "second" may explicitly or implicitly include at least one of that feature. Additionally, the technical solutions of the various embodiments can be combined with each other, but only on the basis of being achievable by those skilled in the art. When the combination of technical solutions is contradictory or impossible to implement, such a combination of technical solutions should be considered non-existent and not within the scope of protection claimed by this invention.
[0049] For ease of understanding, the system software anomaly tracing method of this invention will be described below, such as... Figure 1 As shown, the system software in this embodiment of the invention includes multiple logical levels, and the system software anomaly tracing method includes steps S110 to S130.
[0050] In step S110, error information at each level is obtained when the system software malfunctions. Each level of error information includes multiple fields of different types.
[0051] When tracing the source of system software failures, there are two scenarios. The first scenario is when the system software is under development, and the failure source tracing is performed on the system software under development. The second scenario is when the system software has been developed and put into operation.
[0052] like Figure 2 As shown, in the first case of this embodiment, the system software anomaly tracing method of this embodiment further includes steps S101A to S102A before the step of obtaining the error information of each layer when the system software is abnormal.
[0053] In step S101A, when an exception occurs in the system software, error codes are generated for the hierarchy field, object field, operation field, and exception type field of the current layer where the exception occurred.
[0054] In step S102A, the error code of each field of the current layer where the exception occurred is uploaded to the outermost layer of the system software to form error information.
[0055] Specifically, when the system software includes multiple layers, in the process of uploading the error code corresponding to each field of the current layer where the exception occurs to the outermost layer of the system software, each time a layer passes through, the error codes corresponding to the layer field, object field, operation field and exception type field of that layer are superimposed, until the error codes of all layers where the exception occurs are uploaded to the outermost layer of the system software, forming multi-layer error information.
[0056] In this embodiment, the error code is divided into four fields: software level, object, operation, and exception type. When an exception occurs in the system software, the Token value of the level, object, operation type, and exception type is added to the error code returned by this function. The resulting error code for that level is Return_Value = Token_Lvl | Token_Obj | Token_Opr | Token_err.
[0057] During the process of returning to the previous function, if the previous function has processed the return value, the token value of that layer can be added again until the outermost layer is returned.
[0058] In the process of uploading error codes to the outermost layer of the system software, the error codes corresponding to each level are superimposed to form an error code encoding table as shown in Table 1, thus obtaining multi-layered error information when the system software malfunctions.
[0059] Table 1
[0060]
[0061] like Figures 3 to 5 As shown, after obtaining the error code, a reverse analysis can be performed using a lookup table to find the specific level, object, operation, and exception type corresponding to the error code in the error code encoding table. This allows for the determination of the exception object, exception operation, and exception type at each level of the system software exception, thus enabling rapid location of system software exceptions.
[0062] like Figure 7 As shown, if the system software has been developed and put into operation, and it is impossible to locate the problem by recording error codes as described above, that is, in the second case of this embodiment, the abnormality of the system software can be quickly located through the following steps.
[0063] In the second case of this embodiment, the system software anomaly tracing method of this embodiment further includes steps S101B to S103B before the step of obtaining error information of each layer when a system software anomaly occurs.
[0064] In step S101B, when an error occurs in the system software, a field corresponding to the error information of each layer is generated with a preset field length.
[0065] In step S102B, the fields corresponding to the error information of each level when an exception occurs are stored in the storage block.
[0066] In step S103B, when the field length of the error information stored in each storage block exceeds a preset length threshold, the error information in the corresponding storage block is sent to an external device.
[0067] Specifically, the storage capacity of the storage block is an integer multiple of the preset field length.
[0068] In this embodiment, the error record area records the error information of each time the system software encounters an exception, using the format of error function name + exception type.
[0069] The error logging area is a fixed-size area, such as the size of a page of flash memory. The error logging area can be located in the RAM (Random Access Memory) or ROM (Read Only Memory) area of an electronic device.
[0070] When writing abnormal error records of system software to the error record area in a loop, in order to prevent the cross-recording of two error messages, for example, if a flash record contains 10 abnormal error records, when the 11th abnormal error record is written, if the field length of the 11th abnormal error record is shorter than the field length of the 1st abnormal error record, the 11th abnormal error record may only cover the first half of the 1st abnormal error record, retaining the second half. This would cause the parsing of the error information to first parse the second half of the 1st abnormal error record and then parse the 11th abnormal error record, resulting in a disordered function call relationship and making it impossible to correctly trace the abnormal location of the software.
[0071] Therefore, in this embodiment, the number of bytes written for each error record when the system software encounters an anomaly can be set to a fixed N bytes, each storage block records K abnormal error records, and one page of flash records M bytes.
[0072] For example, it can be set to a single-layer exception error record N=32 bytes; magic + timestamp=8 bytes, or only timestamp 8 bytes; function name=20 bytes; error code=4 bytes;
[0073] Based on the complexity of the system software and the calling level, each storage block is set to record K abnormal error records, i.e. K level calling errors. In this way, a page of M bytes can be divided into M / (N*K) blocks. For example, 4096 bytes in a page of flash can be divided into 8 storage blocks of 512 bytes each, and each storage block can record 16 levels of errors.
[0074] When the number of newly written erroneous bytes in each storage block exceeds the threshold byte L, the contents of the storage block are committed once.
[0075] Specifically, the threshold byte L is set to 384 bytes, i.e. 512-128 bytes, so that when the storage block is about to be full, the error information stored in it can be committed. After the commit is completed, the number of newly written bytes in the storage block is cleared to zero, the commit time is determined and recorded.
[0076] By setting the storage capacity of the storage block to an integer multiple of the preset field length, each write to the flash page can be shifted to an integer multiple of N*K. If it is an overwrite write, an integer multiple of N*K is erased. This avoids the problem of chaotic call relationships.
[0077] In step S120, based on whether there is an error code in the system software, the corresponding error query action is invoked to read each field in the error information at each level and obtain the traceability information when the system software is abnormal.
[0078] like Figure 6 As shown, in the first case of this embodiment, step S120 includes steps S121A to S123A.
[0079] In step S121A, when there is an error code in the system software, the error code stored in the error record block for each exception is read, and the error code corresponding to each field in each layer of error information is obtained.
[0080] The error record block includes multiple storage blocks, and the error code of each time the system software encounters an exception is stored in the corresponding storage block.
[0081] In step S122A, the pre-acquired error code encoding table is called, and each error code is compared with the error code encoding table to determine the error type corresponding to each error code.
[0082] In step S123A, based on the error type corresponding to each error code at each level, traceability information for each level when a system software exception occurs is generated.
[0083] like Figures 3 to 5As shown, specifically, when error codes exist in the system software, the error information at each level includes a hierarchy field, an object field, an operation field, and an exception type field; the hierarchy field is the hierarchy at which the system software encounters an exception, the object field is the object being operated on at the current hierarchy when the system software encounters an exception, the operation field is the operation performed when the system software encounters an exception, and the exception type field is the type of exception that occurred in the system software.
[0084] In this embodiment, taking system software developed based on a real-time operating system as an example, the system software layers can be divided into component layer fields, kernel layer fields, and driver layer fields, respectively, corresponding to the component layer, kernel layer, and driver layer. Objects can be divided into thread fields, process fields, device fields, file fields, semaphore fields, mutex fields, and other objects, respectively, corresponding to threads, processes, devices, files, semaphores, and mutexes. Operation types can be divided into open fields, close fields, read fields, write fields, control fields, and other operation types, respectively, corresponding to open, close, read, write, and control. Error types can be divided into parameter error fields, permission failure fields, connection failure fields, interface errors, and other error types, respectively, corresponding to parameter error, permission failure, connection failure, and interface error.
[0085] Different levels, objects, operations, and exception types are encoded with different error codes. When an exception occurs in the system software, the corresponding code can be retrieved by looking up a table to determine the level of the exception and the objects, operations, and exception types at each level.
[0086] like Figure 8 As shown, in the second case of this embodiment, step S120 includes steps S121B to S122B.
[0087] In step S121B, when there is no error code in the system software, the error record block is invoked, and each field of the error information in each storage block of the error record block is read.
[0088] In step S122B, based on each field read from each storage block, traceability information for each layer is generated each time an anomaly occurs in the system software.
[0089] Specifically, when there is no error code in the system software, the error information of each level when an exception occurs is read from the error record block, which is stored in blocks.
[0090] The error log block includes multiple storage blocks for storing error information. Each time an exception occurs in the system software, at least one layer of error information is stored in the corresponding storage block. The error information at each layer includes a separator field, the number of storage blocks, a time field, a throw function field, an exception type field, and integrity verification information.
[0091] Different storage blocks are separated by a separator field, the time field is the time when the system software encountered an exception, the error-throwing function field is the function called when the system software encountered an exception, and the exception type field is the type of exception that occurred in the system software.
[0092] like Figure 9 As shown, in this embodiment, a flash page includes N bytes. When the flash page is full, the area is written cyclically. In the error recording area, errors are recorded in the following form: Magic, timestamp of the first error, name of the first function that threw the error, exception type, name of the second function that threw the error, exception type, ..., Magic, timestamp of the second error, name of the first function that threw the error, exception type ... Magic, timestamp of the Nth error, name of the first function that threw the error, exception type, name of the second function that threw the error, exception type.
[0093] If a system software malfunctions, the error record area in the flash memory can be read, and the program location of the malfunctioning system software can be quickly traced based on the obtained function name and exception type.
[0094] In this embodiment, both error codes and error messages are stored in the error record area. When an anomaly occurs in the system software, the data in the error record area is read. If there is an error code, the anomaly is analyzed and traced by parsing the error code. If there is no error code, the anomaly is analyzed and traced directly by the error message.
[0095] If no error code is found, the error information is first stored in the storage block, following these steps:
[0096] Calculate the number of unwritten blocks required to write error information to the storage block, and obtain the number of written blocks that have already been written with error information. Then, based on the relationship between the sum of the number of unwritten blocks and the number of written blocks and the total number of storage blocks, choose to write error information to each unwritten block in sequence, or write error information to all unwritten blocks, erase the written blocks, and write the remaining error information.
[0097] Specifically, when writing error information to unwritten blocks, the sum of the number of unwritten blocks and the number of written blocks is calculated to see if it is greater than the total number of storage blocks. If it is less than or equal to the total number of storage blocks, error information is written to each unwritten block in sequence. If it is greater than the total number of storage blocks, after all unwritten blocks are filled, the remaining error information is written starting from the first storage block until all error information is written.
[0098] For example, such as Figure 9 As shown, the specific steps are as follows:
[0099] (1) Divide the error record area in the flash memory into T storage blocks;
[0100] (2) When an error message is generated, calculate the number M of storage blocks required to record the error message;
[0101] (3) Read the address Y of the storage block that has been written to in the current error record area, calculate the sum Y+M of Y and the number of storage blocks that need to be occupied M, and determine whether Y+M is greater than the total number of storage blocks T. If Y+M is less than or equal to T, then execute step 4 below; if Y+M is greater than T, after all the unwritten blocks are filled, write the remaining error information (Y+M)-T blocks starting from the first storage block after erasure, until all error information is written.
[0102] (4) Write error information into the storage block using Hex format.
[0103] (5) Write the number of storage blocks Y = (Y + M)%T that have been written to the flash or file.
[0104] In some optional embodiments, the system software anomaly tracing method provided by the present invention further includes: when the server sends a remote interaction command to the device running the system software, after reading the error information, or error code and error information in the error record area, in response to the remote interaction command, sending the error information in the storage block to the server.
[0105] In some optional embodiments, the system software anomaly tracing method provided by the present invention further includes: performing integrity verification on error information and / or error codes by means of cyclic redundancy detection (CRC) or hash value detection.
[0106] When analyzing and tracing the source of an anomaly using error messages, the steps are as follows:
[0107] Read the number of blocks written when the error message was last written from all storage blocks. Look back at the first delimiter field in the error record block to confirm the number of blocks written. Read the error message from the storage block where the error message was last written.
[0108] Specifically, the error message is in HEX encoding format. The system software exception tracing method provided in this embodiment of the invention further includes: after obtaining the error message, parsing the error message into a String type including function information using a pre-acquired parsing tool. The parsing tool can be a pre-acquired or pre-constructed data parsing tool that matches the error message recording format of this invention.
[0109] For example, such as Figure 9 As shown, the specific steps are as follows:
[0110] (1) The number of previously written blocks, Y;
[0111] (2) Start reading the first MAGIC value from the Yth block back. If it is not found by the first block, start reading from the last block back until the first MAGIC value is found. This is called the Nth block.
[0112] (3) The number of all storage blocks containing the error message found by MAGIC is M;
[0113] (4) Read the Hex file containing error information from block N to block N+M;
[0114] (5) Use a Hex to String tool to convert the error message into a string format of a function call.
[0115] By encoding error messages into HEX format and parsing them into a String type that includes function information using a parsing tool, operators can directly see the function call relationships. At the same time, encryption is implemented to prevent external personnel from parsing error messages and causing leaks.
[0116] In step S130, the location where the system software malfunctioned is determined based on the traceability information.
[0117] In this embodiment, the tracing information is obtained by reading each field in the error information at each level, which shows the level, object, operation, and exception type of the system software when the exception occurred, or the function name and exception type of the system software when the exception occurred, thereby quickly tracing the location of the software program where the exception occurred.
[0118] Therefore, when system software malfunctions, the method described in this application can directly identify the specific error and its location from the return value or error log file.
[0119] When electronic devices used in production malfunction, the method described in the above embodiments of this application allows for convenient and quick remote acquisition and analysis of error information, eliminating the need for on-site inspection or the removal of malfunctioning devices from the manufacturer for analysis. This enables the identification and location of system software malfunctions.
[0120] The system software anomaly tracing method provided in this embodiment of the invention includes: obtaining error information at each level when the system software is abnormal, wherein each level of error information includes multiple fields of different types; based on whether there is an error code in the system software, calling the corresponding error query action, reading each field in each level of error information, and obtaining tracing information when the system software is abnormal; and determining the location where the system software anomaly occurs based on the tracing information.
[0121] The system software anomaly tracing method provided in this embodiment of the invention can record error information at each logical level when software anomalies occur. When tracing anomalies, it can directly find the location and object of the fault at each logical level from the error information at each level. Remote troubleshooting can be performed without moving the device used to run the software, thus achieving accurate location of software faults.
[0122] In addition to the above method embodiments, the present invention also provides, for example, Figure 10 The system software anomaly tracing device shown includes: a processor 201 and a memory 202, wherein the memory 202 stores instructions; the processor 201 calls the instructions in the memory 202 to cause the processor 201 to execute the system software anomaly tracing method of any of the foregoing embodiments of the present invention.
[0123] The system software anomaly tracing method provided in the above embodiments of the present invention includes: obtaining error information at each level when the system software is abnormal, wherein each level of error information includes multiple fields of different types; based on whether there is an error code in the system software, calling the corresponding error query action, reading each field in each level of error information, and obtaining tracing information when the system software is abnormal; and determining the location where the system software anomaly occurs based on the tracing information.
[0124] The system software anomaly tracing device provided in this embodiment of the invention, by implementing the above method, can record error information of each logical level when software anomalies occur. When tracing anomalies, it can directly find the location and object of the fault in each logical level from the error information of each level. Remote troubleshooting can be performed without moving the device used to run the software, thus achieving accurate location of software faults.
[0125] Furthermore, the system software anomaly tracing device provided in this embodiment of the invention may also include a communication interface 203 and a bus 204, with the processor 201, memory 202 and communication interface 203 electrically connected via the bus 204.
[0126] The memory 202 may include high-speed random access memory (RAM) and may also include non-volatile memory, such as at least one disk storage device. Communication between this system network element and at least one other network element is achieved through at least one communication interface 203 (which can be wired or wireless), such as the Internet, wide area network, local area network, metropolitan area network, etc. The bus 204 can be an ISA bus, PCI bus, or EISA bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 10The symbol is represented by a single double-headed arrow, but this does not mean that there is only one bus or one type of bus.
[0127] Processor 201 may be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of processor 201 or by instructions in software form. The processor 201 can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this invention. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this invention can be directly manifested as execution by a hardware decoding processor, or execution by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory 202. The processor 201 reads the information in memory 202 and, in conjunction with its hardware, completes the steps of the method described in the foregoing embodiments.
[0128] This invention also provides a computer-readable storage medium, which can be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores instructions that, when executed on a computer, cause the computer to perform the steps of the above-described system software anomaly tracing method.
[0129] The computer-readable storage medium provided in this embodiment of the invention stores data and computer-executable instructions for the above-described system software anomaly tracing method. The above-described system software anomaly tracing method includes: obtaining error information at each level when the system software is abnormal, wherein each level of error information includes multiple fields of different types; based on whether there is an error code in the system software, calling the corresponding error query action, reading each field in each level of error information, and obtaining tracing information when the system software is abnormal; and determining the location where the system software anomaly occurs based on the tracing information.
[0130] The computer-readable storage medium provided in this embodiment of the invention implements the above-described system software anomaly tracing method, which can record error information of each logical level when software anomalies occur. When tracing anomalies, it can directly find the location and object of the fault in each logical level from the error information of each level, and perform remote troubleshooting without moving the device used to run the software, thereby achieving accurate location of software faults.
[0131] This application also provides a computer program product, which, when executed on a data processing device, is suitable for executing a program that initializes the following method steps:
[0132] In step S110, error information at each level is obtained when the system software malfunctions. Each level of error information includes multiple fields of different types.
[0133] In step S120, based on whether there is an error code in the system software, the corresponding error query action is invoked to read each field in the error information at each level and obtain the traceability information when the system software is abnormal.
[0134] In step S130, the location where the system software malfunctioned is determined based on the traceability information.
[0135] The computer program product provided in this embodiment of the invention implements the above-mentioned system software anomaly tracing method, which can record error information of each logical level when software anomalies occur. When tracing anomalies, it can directly find the location and object of the fault in each logical level from the error information of each level, and perform remote troubleshooting without moving the device used to run the software, thereby achieving accurate location of software faults.
[0136] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0137] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0138] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for tracing system software anomalies, characterized in that, The system software comprises multiple logical levels, and the method includes: Obtain error information at each level when the system software malfunctions, wherein each level of error information includes multiple fields of different types; Based on whether an error code exists in the system software, the corresponding error query action is invoked to read each field from the error information at each level, thereby obtaining traceability information when the system software malfunctions. When the error code exists in the system software, the error code for each exception occurrence is read from the error record block, which is stored in blocks, and the error code corresponding to each field in the error information of each layer is obtained. When the error code is not present in the system software, the error record block is invoked, and each field of the error information in each storage block of the error record block is read; Based on the traceability information, the location where the system software malfunctioned is determined.
2. The system software anomaly tracing method according to claim 1, characterized in that, When the error code exists in the system software, the error information at each level includes a hierarchy field, an object field, an operation field, and an exception type field; The hierarchy field is the hierarchy at which the system software encounters an exception; the object field is the object being operated on at the current hierarchy when the system software encounters an exception; the operation field is the operation performed when the system software encounters an exception; and the exception type field is the type of exception that occurred in the system software.
3. The system software anomaly tracing method according to claim 2, characterized in that, The steps of determining whether an error code exists in the system software, invoking the corresponding error query action, reading each field in the error information at each level, and obtaining traceability information when the system software is abnormal include: When the error code exists in the system software, the pre-acquired error code encoding table is called, and each error code is compared with the error code encoding table to determine the error type corresponding to each error code. Based on the error type corresponding to each error code at each layer, traceability information for each layer is generated when a system software exception occurs.
4. The system software anomaly tracing method according to claim 3, characterized in that, The error record block includes multiple storage blocks, and the error code is stored in the corresponding storage block each time an exception occurs in the system software.
5. The system software anomaly tracing method according to claim 2, characterized in that, Prior to the step of obtaining error information at each level when a system software exception occurs, the method further includes: When an exception occurs in the system software, the error code is generated in the hierarchy field, object field, operation field, and exception type field of the current layer where the exception occurred; The error code of each field in the current layer where the exception occurred is uploaded to the outermost layer of the system software to form the error information.
6. The system software anomaly tracing method according to claim 5, characterized in that, When the system software includes multiple layers, in the process of uploading the error code corresponding to each field of the current layer where the exception occurs to the outermost layer of the system software, for each layer, the error code corresponding to the layer field, object field, operation field and exception type field of that layer is superimposed until the error codes of all layers where the exception occurs are uploaded to the outermost layer of the system software, forming multiple layers of error information.
7. The system software anomaly tracing method according to claim 2, characterized in that, The hierarchical fields include component layer fields, kernel layer fields, and driver layer fields; The object fields include thread fields, process fields, device fields, file fields, semaphore fields, and mutex fields; The operation fields include open field, close field, read field, write field, and control field; The exception type fields include parameter error field, permission failure field, connection failure field, and interface error field.
8. The system software anomaly tracing method according to claim 1, characterized in that, When the error code is not present in the system software, the error information of each level when an exception occurs is read from the error record block, which is stored in blocks. The error record block includes multiple storage blocks. At least one level of the error information when an exception occurs in the system software is stored in the corresponding storage block. The error information of each level includes a separator field, the number of storage blocks, a time field, a throw function field, an exception type field, and integrity verification information. Different storage blocks are separated by the separator field, the time field is the time when the system software encountered an exception, the error-throwing function field is the function called when the system software encountered an exception, and the exception type field is the type of exception that occurred in the system software.
9. The system software anomaly tracing method according to claim 8, characterized in that, The steps of determining whether an error code exists in the system software, invoking the corresponding error query action, reading each field in the error information at each level, and obtaining traceability information when the system software is abnormal include: When the error code is not present in the system software, traceability information for each layer is generated based on each field read from each storage block when an exception occurs in the system software.
10. The system software anomaly tracing method according to claim 8, characterized in that, Prior to the step of obtaining error information at each level when a system software exception occurs, the method further includes: When a system software malfunctions, fields corresponding to the error information at each level are generated with a preset field length. The fields corresponding to the error information of each layer when an exception occurs are stored in the storage block; When the field length of the error information stored in each storage block exceeds a preset length threshold, the error information in the corresponding storage block is sent to an external device.
11. The system software anomaly tracing method according to claim 10, characterized in that, The storage capacity of the storage block is an integer multiple of the preset field length.
12. The system software anomaly tracing method according to claim 8, characterized in that, The error recording block includes multiple storage blocks for storing the error information, and the method further includes: When the error information is obtained, the number of unwritten blocks required to write the error information into the storage block is calculated, and the number of written blocks into which the error information has been written is obtained; Based on the relationship between the sum of the number of unwritten blocks and the number of written blocks and the total number of storage blocks, the error information is written sequentially to each unwritten block, or the written blocks are erased and the remaining error information is written after the error information is written to all unwritten blocks.
13. The system software anomaly tracing method according to claim 12, characterized in that, When writing the error information into the unwritten blocks, calculate whether the sum of the number of unwritten blocks and the number of written blocks is greater than the total number of storage blocks. If it is less than or equal to the total number of storage blocks, write the error information into each unwritten block in sequence. If it is greater than the total number of storage blocks, after all the unwritten blocks are filled, start writing the remaining error information from the first storage block until all the error information is written.
14. The system software anomaly tracing method according to claim 12, characterized in that, The error message is in HEX encoding format, and the method further includes: Read the number of blocks written at the time the error message was last written from all the storage blocks; Based on the number of written blocks, look back at the first separator field in the error record block to confirm the number of written blocks, and read the error information from the storage block where the last error information was written.
15. The system software anomaly tracing method according to claim 14, characterized in that, The method further includes: After obtaining the error information, the error information is parsed into a String type that includes function information using a pre-obtained parsing tool.
16. The system software anomaly tracing method according to claim 4 or 8, characterized in that, The method further includes: In response to a remote interaction command, the error information in the storage block is sent to the server.
17. The system software anomaly tracing method according to claim 1, characterized in that, The method further includes: The integrity of the error information and / or the error code is verified by using cyclic redundancy detection or hash value detection.
18. A system software anomaly tracing device, characterized in that, The system software anomaly tracing device includes: a processor and a memory, wherein the memory stores instructions; The processor invokes the instructions in the memory to cause the system software anomaly tracing device to implement the system software anomaly tracing method as described in any one of claims 1 to 17.
19. A computer-readable storage medium storing instructions thereon, characterized in that, When the instruction is executed by the processor, it implements the system software anomaly tracing method as described in any one of claims 1 to 17.
20. A computer program product, characterized in that, It includes a computer program that, when executed by a processor, implements the system software anomaly tracing method as described in any one of claims 1 to 17.