A method, apparatus and electronic device for automatic log comparison
By using an automatic log comparison method and ignoring unnecessary differences using a preset keyword list, the inefficiency and errors caused by manual configuration in existing technologies are solved, achieving efficient and accurate log comparison and fault diagnosis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INSPUR SUZHOU INTELLIGENT TECH CO LTD
- Filing Date
- 2023-09-05
- Publication Date
- 2026-06-30
AI Technical Summary
Existing log comparison technologies rely on manual configuration, which is time-consuming, labor-intensive, and prone to omissions or errors, especially in the case of large-scale log datasets where processing efficiency is low.
An automatic comparison method is adopted, which obtains the log to be compared and the log at the previous time point, processes and compares them, uses a preset keyword list to ignore differences that do not need to be concerned, and outputs the comparison results of the differences.
It improves the efficiency and accuracy of log comparison, reduces errors and omissions in manual configuration, quickly locates faults, and improves fault handling efficiency.
Smart Images

Figure CN117290197B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data analysis, and in particular to a method, apparatus, and electronic device for automatic log comparison. Background Technology
[0002] During laboratory testing, electronic devices of the same model, such as servers, are often shipped in large quantities multiple times. Therefore, there is often a need to compare log files from two different dates for the same electronic device with the same configuration.
[0003] However, most current log comparison technologies rely on manual configuration. They require manually configuring relevant rules or patterns for matching and comparison, and cannot automatically filter out unnecessary errors. This means operators need to spend a significant amount of time and effort defining appropriate rules so the system can correctly compare logs. This manual configuration method is not only time-consuming and labor-intensive but also prone to omissions or misconfigurations. Furthermore, with large-scale log datasets, this log comparison technology suffers from low processing efficiency, resulting in long processing times and high resource consumption. Summary of the Invention
[0004] In view of the above problems, the present invention proposes a method, apparatus and electronic device for automatic log comparison.
[0005] In a first aspect, embodiments of the present invention provide a method for automatic log comparison, the method comprising:
[0006] Obtain the log to be compared and the log from the previous time point, and process the two logs;
[0007] The processed log to be compared is compared with the log from the previous time point to determine whether there are any differences between them.
[0008] If the difference point exists, determine whether the difference point contains any keyword from the preset keyword list;
[0009] If any keyword is included, ignore the differences containing that keyword, and output the first comparison result and the preset keyword list;
[0010] If no keyword is included, the output will be the differences where no keyword is included, the second comparison result, and the preset keyword list.
[0011] Optionally, before obtaining the log to be compared and the log from the previous time point, the method further includes:
[0012] Retrieve historical logs;
[0013] The historical logs are segmented to obtain all the words contained in the historical logs.
[0014] For each word, calculate the word frequency and document frequency.
[0015] Based on the word frequency value and document frequency value of each word, a comprehensive weighted calculation is performed to obtain the weighted value corresponding to each word.
[0016] All words are sorted in descending order according to the weighted values, and multiple words are selected based on preset rules to form the preset keyword list.
[0017] Optionally, for each word, word frequency and document frequency values are calculated, including:
[0018] Determine the number of times the target word appears in a historical log, and determine the total number of words in that historical log;
[0019] The frequency value is obtained by dividing the number of occurrences by the total number of words.
[0020] Determine the total number of historical logs, and determine the number of corrections for historical logs containing the target vocabulary;
[0021] The document frequency value is obtained by dividing the total number by the corrected number.
[0022] The correction quantity is obtained by summing the number of historical logs containing the target vocabulary with 1.
[0023] Optionally, a comprehensive weighted calculation is performed based on the word frequency value and document frequency value of each word to obtain the weighted value corresponding to each word, including:
[0024] The word frequency value of the target word is multiplied by the document frequency value of the target word to obtain the product value;
[0025] Based on the target device model, determine the weighting value corresponding to the target term, wherein the target device model is the device model corresponding to the device that generates historical logs containing the target term;
[0026] The product value is weighted according to the weighting value to obtain the weighted value corresponding to the target word.
[0027] Optionally, all words are sorted in descending order according to the weighted values, and multiple words are selected based on preset rules to form the preset keyword list, including:
[0028] All words are sorted in descending order according to their weighted values, and words with weighted values greater than a preset value are selected to form the preset keyword list; or...
[0029] All words are sorted in descending order according to their weighted values, and a preset percentage of words with the highest weighted values are selected to form the preset keyword list.
[0030] Optionally, the log to be compared and the log from the previous time point are obtained, and the two logs are processed, including:
[0031] Using the system and application interfaces, log files are periodically and automatically collected and uploaded to a preset storage location;
[0032] Based on timestamp technology, log files in the preset storage location are traversed to determine the log to be compared and the log at the previous time point.
[0033] The content of the log to be compared is converted into the corresponding first data structure;
[0034] The contents of the log from the previous time point are converted into the corresponding second data structure.
[0035] Optionally, the processed log to be compared is compared with the log from the previous time point, including:
[0036] Based on the first data structure, the log to be compared is divided into log lines or log entries using a preset comparison function;
[0037] Based on the second data structure, the preset comparison function is used to divide the log from the previous time point into log lines or log entries;
[0038] For each log line segmented from the log to be compared, compare it line by line with the log line segmented from the previous time point; or,
[0039] Each log entry segmented from the log to be compared is compared with each log entry segmented from the log at the previous time point.
[0040] One method of comparing each entry individually includes: comparing the entire log data of the two target log entries; or,
[0041] Compare two target log entries based on specific fields.
[0042] Optionally, if no discrepancy exists, the result of comparing the two log lines or log entries to be consistent is directly output;
[0043] If the discrepancy point exists and the discrepancy point contains any of the keywords, then output the result of the comparison between the two log lines or two log entries containing the discrepancy point, and output the preset keyword list at the same time;
[0044] If the discrepancy point exists and does not contain any of the keywords, then the result of the discrepancy point being found to be inconsistent between the two log lines or log entries containing the discrepancy point will be output, along with the discrepancy point and the preset keyword list.
[0045] Optionally, after outputting the preset keyword list, the method further includes:
[0046] Using the preset keyword list, two log lines or log entries with the same comparison result are reviewed to confirm that the difference between the two log lines or log entries with the same comparison result contains any of the keywords.
[0047] If, upon re-verification, it is determined that the differences between two log lines or two log entries that match the comparison results still contain any of the aforementioned keywords, then no information will be sent.
[0048] If a second review confirms that the difference between two log lines or two log entries with the same comparison result does not contain any of the keywords, an exception message will be sent and displayed to prompt the operator to conduct a manual review.
[0049] Using the preset keyword list, the two log lines or log entries with inconsistent comparison results are reviewed to confirm that the differences between the two log lines or log entries with inconsistent comparison results do not contain any of the keywords.
[0050] If, upon further review, it is determined that the differences between two log lines or two log entries with inconsistent comparison results still do not contain any of the aforementioned keywords, then no information will be sent.
[0051] If a second review confirms that the difference between two log lines or two log entries with inconsistent comparison results contains any of the aforementioned keywords, then the exception information is sent and displayed to prompt the operator to conduct a manual review.
[0052] Optionally, the preset keyword list is managed based on a feature data structure.
[0053] The feature data structure includes: a module list, identifiers or names, and attribute or status information;
[0054] The module list is used to manage all modules in the log;
[0055] The identifier or name is used to uniquely identify each module in the log;
[0056] The attribute or status information is used to characterize each keyword in the preset keyword list.
[0057] Secondly, embodiments of the present invention also provide an apparatus for automatic log comparison, the apparatus comprising:
[0058] The log acquisition and processing module is used to acquire the log to be compared and the log from the previous time point, and to process the two logs.
[0059] The comparison and difference determination module is used to compare the processed log to be compared with the log at the previous time point to determine whether there are any differences between the two.
[0060] The keyword determination module is used to determine whether the difference point contains any keyword from a preset keyword list if the difference point exists.
[0061] The first output module is used to ignore the differences containing any keyword if any keyword is included, and output the first comparison result and the preset keyword list.
[0062] The second output module is used to output the differences that do not contain any keywords, the second comparison result, and the preset keyword list if no keywords are contained.
[0063] Optionally, the device further includes:
[0064] The acquisition module is used to retrieve historical logs;
[0065] The word segmentation module is used to segment the historical logs to obtain all the words contained in the historical logs.
[0066] The calculation module is used to calculate word frequency and document frequency values based on each word.
[0067] The weighting module is used to perform a comprehensive weighted calculation based on the word frequency value and document frequency value of each word to obtain the weighted value corresponding to each word.
[0068] The sorting and selection module is used to sort all words in descending order according to the weighted value, select multiple words based on preset rules, and form the preset keyword list.
[0069] Optionally, the computing module includes:
[0070] The frequency and total vocabulary units are used to determine the number of times a target word appears in a historical log, and to determine the total number of words in that historical log;
[0071] The word frequency calculation unit is used to divide the number of occurrences by the total number of words to obtain the word frequency value;
[0072] The total number and correction number units are used to determine the total number of historical logs and the correction number of historical logs containing the target vocabulary;
[0073] The document frequency value calculation unit is used to divide the total number by the corrected number to obtain the document frequency value;
[0074] The correction quantity is obtained by summing the number of historical logs containing the target vocabulary with 1.
[0075] Optionally, the weighting module includes:
[0076] The product unit is used to multiply the word frequency value of the target word with the document frequency value of the target word to obtain the product value;
[0077] The weighting value determination unit is used to determine the weighting value corresponding to the target word based on the target device model, wherein the target device model is the device model corresponding to the device that generates historical logs containing the target word.
[0078] A weighting unit is used to assign weights to the product value according to the weighting value to obtain the weighted value corresponding to the target word.
[0079] Optionally, the sorting and selection module is specifically used for:
[0080] All words are sorted in descending order according to their weighted values, and words with weighted values greater than a preset value are selected to form the preset keyword list; or...
[0081] All words are sorted in descending order according to their weighted values, and a preset percentage of words with the highest weighted values are selected to form the preset keyword list.
[0082] Optionally, the log acquisition and processing module is specifically used for:
[0083] Using the system and application interfaces, log files are periodically and automatically collected and uploaded to a preset storage location;
[0084] Based on timestamp technology, log files in the preset storage location are traversed to determine the log to be compared and the log at the previous time point.
[0085] The content of the log to be compared is converted into the corresponding first data structure;
[0086] The contents of the log from the previous time point are converted into the corresponding second data structure.
[0087] Optionally, the comparison and difference determination module is specifically used for:
[0088] Based on the first data structure, the log to be compared is divided into log lines or log entries using a preset comparison function;
[0089] Based on the second data structure, the preset comparison function is used to divide the log from the previous time point into log lines or log entries;
[0090] For each log line segmented from the log to be compared, compare it line by line with the log line segmented from the previous time point; or,
[0091] Each log entry segmented from the log to be compared is compared with each log entry segmented from the log at the previous time point.
[0092] One method of comparing each entry individually includes: comparing the entire log data of the two target log entries; or,
[0093] Compare two target log entries based on specific fields.
[0094] Optionally, the device further includes:
[0095] The verification module is used to verify two log lines or log entries with the same comparison result by using the preset keyword list, and to confirm again that the difference between the two log lines or log entries with the same comparison result contains any of the keywords.
[0096] The sending and display module is used to send no information if the difference between two log lines or two log entries that are consistent with the comparison results after re-verification still contains any of the keywords, and to send and display abnormal information if the difference between two log lines or two log entries that are consistent with the comparison results after re-verification does not contain any of the keywords, so as to prompt the operator to perform manual verification.
[0097] The verification module is also used to use the preset keyword list to verify two log lines or two log entries with inconsistent comparison results, and to confirm again that the difference between the two log lines or two log entries with inconsistent comparison results does not contain any of the keywords.
[0098] If, upon further review, it is determined that the differences between two log lines or two log entries with inconsistent comparison results still do not contain any of the aforementioned keywords, then no information will be sent.
[0099] The sending and display module is also used to send and display the abnormal information if, after re-verification, it is determined that the difference between two log lines or two log entries with inconsistent comparison results contains any of the keywords, so as to prompt the operator to conduct a manual review.
[0100] Thirdly, embodiments of the present invention also provide an electronic device, which uses the automatic log comparison method as described in any of the first aspects to automatically compare the logs it generates.
[0101] The automatic log comparison method provided by this invention first obtains the log to be compared and the log at the previous time point, and processes the two logs; then compares the processed log to be compared with the log at the previous time point to determine whether there are any differences between them.
[0102] If a difference exists, determine whether the difference contains any keyword from the preset keyword list; if it contains any keyword, ignore the difference containing that keyword and output the first comparison result and the preset keyword list; if it does not contain any keyword, output the difference that does not contain any keyword, the second comparison result, and the preset keyword list.
[0103] The automatic log comparison method proposed in this invention creatively ignores differences when keywords are present and only outputs differences when keywords are absent. Since differences caused by keywords are not faults that operators need to focus on, operators only need to check the output differences to quickly locate the problem and perform fault diagnosis and troubleshooting, improving the efficiency of fault handling. Furthermore, it eliminates the need for manual configuration of relevant rules or patterns for matching and comparison, avoiding errors and omissions in manual comparison, significantly improving the efficiency and accuracy of log comparison, and providing a convenient method for system monitoring and fault diagnosis, demonstrating high practicality. Attached Figure Description
[0104] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings:
[0105] Figure 1 This is a schematic flowchart of a method for automatic log comparison according to an embodiment of the present invention;
[0106] Figure 2 This is an overall flowchart of log comparison in an embodiment of the present invention. Detailed Implementation
[0107] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present invention, and are only some, not all, embodiments of the present invention, and are not intended to limit the present invention.
[0108] The inventors discovered that during current laboratory testing, electronic devices of the same model, such as servers, are often shipped in large quantities multiple times. Therefore, there is often a need to compare log files from two different dates for the same electronic device with the same configuration.
[0109] In general, during the operation of electronic equipment systems, log files record information about various events and operations. When a system malfunctions or experiences an anomaly, comparing the log files can help identify potential problem sources and facilitate troubleshooting and fault handling analysis.
[0110] Further research by the inventors revealed that most current log comparison technologies rely on manual configuration. They require manual configuration of relevant rules or patterns for matching and comparison, and cannot automatically eliminate errors that do not need attention. This means that operators need to spend a lot of time and effort to define appropriate rules so that the system can correctly compare logs.
[0111] For example, one comparison method is as follows: obtain the log information contained in the log to be detected; extract the text from the log information to obtain the keyword sequence of the log to be detected; mark each keyword in the keyword sequence of the log to be detected and the preset log keyword sequence configured by the user with a serial number; obtain the keyword corresponding to a certain serial number in the keyword sequence of the log to be detected, and record it as the first keyword; and obtain the keyword corresponding to the same serial number from the preset log keyword sequence, and record it as the second keyword.
[0112] Calculate the Euclidean distance between the first keyword and the second keyword; perform a weighted summation of the Euclidean distances corresponding to all keywords in the keyword sequence of the log to be detected, and use the weighted summation result as the similarity between the keyword sequence of the log to be detected and the preset keyword sequence of the log; compare the similarity of the detected keyword sequence with the similarity of the preset keyword sequence of the log. If the similarity between the keyword sequence of the log to be detected and the preset keyword sequence of the log is greater than the preset similarity threshold, it is determined that the log to be detected is abnormal.
[0113] Alternatively: Manually configure a preset error set, and iterate through the logs according to the preset error set; for each log, compare the error items in the preset error set with the log. If the log contains an error item from the preset error set, then the log is considered an error log.
[0114] The aforementioned methods, which rely on manual configuration, are not only time-consuming and labor-intensive, but also prone to omissions or misconfigurations. Furthermore, in the case of large-scale log datasets, these log comparison techniques suffer from low processing efficiency, resulting in long processing times and high resource consumption.
[0115] To address the aforementioned problems, the inventors have creatively proposed the method, apparatus, and electronic device for automatic log comparison of the present invention. The following provides a detailed explanation and description of the method, apparatus, and electronic device for automatic log comparison proposed in this invention.
[0116] This invention proposes a method for automatic log comparison, referring to... Figure 1 The flowchart shown below illustrates the automatic log comparison method, which includes:
[0117] Step 101: Obtain the log to be compared and the log from the previous time point, and process the two logs.
[0118] To automatically compare logs, the first step is to obtain the log to be compared and the log from the previous time point, and then process the two logs.
[0119] In some possible embodiments, there are multiple ways to obtain and process the two logs mentioned above. A preferred method for obtaining the log to be compared and the log from the previous time point, and processing the two logs, includes the following steps:
[0120] Step S1: Utilize the system and application interfaces to periodically and automatically collect and upload log files to a preset storage location.
[0121] In this embodiment of the invention, the device's system and application may generate a large number of log files during testing. Therefore, the system and application interfaces can be used to obtain these log files. However, considering the large number of log files, processing all logs at once would place high demands on the processor's performance. To avoid this excessive performance requirement, this embodiment of the invention involves periodic automatic collection of logs, followed by uploading the collected logs to a preset storage location.
[0122] For example, logs can be transferred to storage via NAS, and then retrieved from storage sequentially for processing. This avoids the excessive processor load caused by processing the entire log at once and reduces the performance requirements on the processor. It should be noted that the preset storage location in this embodiment can be, for example, a cache system, hard disk storage, memory storage, etc.
[0123] Step S2: Based on timestamp technology, traverse the log files in the preset storage location to determine the log to be compared and the log at the previous time point.
[0124] Generally, log files sent to a preset storage location contain time information. Therefore, based on timestamp technology, it is only necessary to traverse the log files in the preset storage location to determine the logs to be compared and the logs from the previous time point. That is, it identifies the logs that have not yet been compared at the latest time point and the logs from the previous time point closest to the latest time point. These previous time point logs were used as the logs to be compared in the previous log comparison process.
[0125] Step S3: Convert the contents of the logs to be compared into the corresponding first data structure;
[0126] Step S4: Convert the contents of the log from the previous time point into the corresponding second data structure.
[0127] After identifying the logs to be compared and the log from the previous time point, to facilitate log comparison and data processing, the content of the logs to be compared can be converted into a corresponding first data structure, and the content of the logs from the previous time point can be converted into a corresponding second data structure. The data structures can be, for example, lists or dictionaries.
[0128] By acquiring and processing the logs to be compared and the logs from the previous time point in the above manner, we can not only quickly obtain the logs, identify the logs to be compared and the logs from the previous time point, and convert the logs into a data structure, but also improve the efficiency of subsequent log comparisons.
[0129] Step 102: Compare the processed log to be compared with the log from the previous time point to determine whether there are any differences between them.
[0130] After identifying the logs to be compared and the logs from the previous time point, and processing them separately, the processed logs to be compared and the logs from the previous time point can be compared to determine whether there are any differences between them.
[0131] In some possible embodiments, there are multiple methods for comparing two logs. A preferred method for comparing the processed log to be compared with the log from the previous time point includes the following steps:
[0132] Step V1: Based on the first data structure, use a preset comparison function to segment the log to be compared into log lines or log entries;
[0133] Step V2: Based on the second data structure, use a preset comparison function to divide the log from the previous time point into log lines or log entries.
[0134] In this embodiment of the invention, considering that the log files generated by various components, systems, and applications in the device have different contents and formats, in order to improve the accuracy and efficiency of log comparison, a comparison function, namely a preset comparison function, can be defined. The function of the preset comparison function is to divide the log to be compared and the log at the previous time point according to the rules defined in the comparison function, dividing the two into log lines or log entries, instead of using all the data of the log for comparison. This not only improves the efficiency of log comparison, but also makes the accuracy of line-by-line or entry-by-entry comparison higher.
[0135] For example, if the comparison function defines a method to segment logs into log lines using tabs as markers, then based on the first data structure, the logs to be compared are segmented into log lines using tabs as markers, and based on the second data structure, the logs from the previous time point are segmented into log lines using tabs as markers.
[0136] Step V3: Compare each log line segmented from the log to be compared with the log line segmented from the previous time point; or, compare each log entry segmented from the log to be compared with the log entry segmented from the previous time point.
[0137] Once the log lines or log entries are segmented, the log lines segmented from the log to be compared can be compared line by line with the log lines segmented from the previous time point; or, the log entries segmented from the log to be compared can be compared entry by entry entry.
[0138] It should be noted that, during the line-by-line comparison process, a preferred comparison method includes:
[0139] Compare the entire log data of two target log entries; or compare two target log entries based on a specific field. Two target log entries refer to two log entries: one is a log entry segmented from the log to be compared, representing a specific parameter, and the other is a log entry segmented from the log at a previous time point, representing the same parameter. For example, the log entry segmented from the log to be compared, representing the motherboard temperature variation range, and the log entry segmented from the log at a previous time point, representing the motherboard temperature variation range, are the two target log entries.
[0140] After the log lines or log entries are split, they are compared line by line or entry by entry to determine whether there are any differences between them.
[0141] Step 103: If there are differences, determine whether the differences contain any keyword from the preset keyword list.
[0142] During the process of comparing line by line or item by item, there are two comparison results: one is that there are differences between the two, and the other is that there are no differences between the two.
[0143] Understandably, if there are no differences between the two, meaning there are no discrepancies between the two log lines or entries being compared, and the comparison is completely consistent, then the first comparison result is directly output. This first comparison result indicates that the comparison is consistent and passes. Generally, "pass" is used to indicate that the comparison passed, and "fail" is used to indicate that the comparison failed. Of course, any other method can be used to indicate whether the comparison passed or failed.
[0144] If there are differences between the two, meaning there are discrepancies between the two log lines or entries being compared, and the comparison is not completely consistent, then unlike traditional log comparison methods that output results of inconsistency or failure, along with the differences, this invention creatively proposes that when differences exist, it does not output results of inconsistency or failure, nor does it output the differences themselves. Instead, it determines whether the differences contain any keyword from a preset keyword list.
[0145] A pre-defined keyword list contains multiple keywords. These keywords indicate that if a discrepancy arises because of these keywords, the operator does not need to focus on that discrepancy, as it is not a fault issue requiring their attention. For example, if the keywords are "error" and "abnormal," then if two log lines or log entries have a discrepancy containing the keywords "error" or "abnormal," the discrepancy can be ignored, and the two log lines or log entries with the discrepancy are considered to be consistent.
[0146] In this embodiment of the invention, the preset keyword list needs to be generated before log comparison, and the generation method includes the following steps:
[0147] Step T1: Retrieve historical logs;
[0148] Step T2: Perform word segmentation on the historical logs to obtain all the words contained in the historical logs.
[0149] First, historical logs need to be acquired. The more historical logs, the more accurate the generated keywords will be. After acquiring a massive amount of historical logs, each log entry needs to be segmented into words to obtain all the words contained within each log entry. For specific methods of word segmentation, existing methods can be referenced, such as using tabs to segment historical logs.
[0150] Step T3: Calculate word frequency and document frequency for each word.
[0151] After obtaining all the words, the word frequency and document frequency value are calculated for each word. The specific calculation method for the word frequency value is as follows:
[0152] Determine the number of times the target word appears in a historical log, and determine the total number of words in that historical log; divide the number of occurrences by the total number of words to obtain the word frequency value.
[0153] Select any word from all the words as the target word. For example, select CPU (Central Processing Unit) as the target word. Assume that CPU appears 600 times in a historical log and that the total number of words in the historical log is 1000. Then, divide the number of occurrences of 600 by the total number of words of 1000 to get the word frequency value of 600 ÷ 1000 = 0.6.
[0154] The specific calculation method for document frequency values is as follows:
[0155] Determine the total number of historical logs and the number of corrections for historical logs containing the target words; divide the total number by the number of corrections to obtain the document frequency value; the number of corrections is obtained by summing the number of historical logs containing the target words with 1.
[0156] Continuing with the example of CPU as the target word: if the total number of historical logs is determined to be 10,000, and the number of corrections for historical logs containing the target word is determined to be 9,000, then dividing the total number of 10,000 by the number of corrections of 9,000 yields the document frequency value: 10,000 ÷ 9,000 = 1.11.
[0157] Step T4: Perform a comprehensive weighted calculation based on the word frequency value and document frequency value of each word to obtain the weighted value corresponding to each word.
[0158] After obtaining the term frequency and document frequency values for each word, a comprehensive weighted calculation can be performed based on these values to obtain the weighted value for each word. Specific methods include:
[0159] The word frequency value and document frequency value of the target words are multiplied to obtain the product value; the weight value corresponding to the target words is determined according to the target device model, which is the device model corresponding to the device that generates historical logs containing the target words; the product value is weighted according to the weight value to obtain the weighted value corresponding to the target words.
[0160] Continuing with the example of "CPU" as the target keyword: the word frequency of "CPU" is 0.6, and the document frequency is 1.11. Multiplying these two values gives a product of 0.6 × 1.11 = 0.666. The weighting value for the target keyword is then determined based on the target device model. This is because different device models have different functional focuses and may contain different components, thus resulting in different weighting values.
[0161] For example, if a device is a dedicated device for image processing, then the main component in that device is the GPU, and its weighting value is higher than that of the CPU; another device is a storage device that does not need to perform image processing, so it may not have a GPU, but its weighting value for memory (such as hard drives, flash memory, etc.) is higher than that of the CPU; or the device may be a general-purpose device that has a high demand for CPU computing power, in which case the weighting value of the CPU is higher than that of other components.
[0162] Based on the above considerations, it is necessary to determine the weighting value corresponding to the target words according to the target model. Assuming that the weighting value corresponding to the CPU of a certain model is 1, then the product value 0.666 is weighted according to the weighting value of 1, and the weighted value corresponding to the CPU is: 0.666 × 1 = 0.666.
[0163] Step T5: Sort all words in descending order according to their weighted values, select multiple words based on preset rules, and form a preset keyword list.
[0164] After obtaining the weighted value for each word, all words are sorted in descending order according to their weighted values, i.e., the word with the highest weighted value is placed first, and the word with the lowest weighted value is placed last. Then, multiple words are selected based on preset rules to form a preset keyword list. Specifically:
[0165] After sorting in descending order, you can select words with a weight value greater than a preset value to form a preset keyword list; for example, if the preset value is 0.2, then selecting all words with a weight value greater than 0.2 will form the preset keyword list.
[0166] Alternatively, after sorting in descending order, you can select the words with the highest weighted values at the top of the list to form a preset keyword list; for example, if the preset percentage is 50%, then select the top 50% of words with the highest weighted values to form the preset keyword list.
[0167] Using the methods described above, a preset keyword list is finally obtained. It is understandable that, as historical logs gradually increase, the preset keyword list can be obtained periodically or cyclically using steps T1 to T5 to ensure the accuracy of the entire log comparison.
[0168] Step 104: If any keyword is included, ignore the differences containing that keyword, and output the first comparison result and the preset keyword list.
[0169] There are two possible outcomes when determining whether a difference point contains any keyword: either the difference point contains any keyword, or the difference point does not contain any keyword.
[0170] For discrepancies containing any keyword, since the discrepancy caused by that keyword is not a fault issue that the operator needs to focus on, the discrepancy containing that keyword can be directly ignored. Instead of outputting the discrepancy, output the first comparison result indicating that the two log lines or entries containing the discrepancy are consistent and the comparison passed. Simultaneously, a pre-defined keyword list should also be output. The reason for outputting the pre-defined keyword list will be explained below and will not be repeated here.
[0171] Step 105: If no keyword is included, output the differences where no keyword is included, the second comparison result, and the preset keyword list.
[0172] For cases where no keyword is included in the discrepancy point, since this type of discrepancy point represents a fault that operators need to monitor, it is necessary to output this discrepancy point. The output should indicate that the two log lines or entries containing this discrepancy point are inconsistent and the comparison failed, along with a pre-defined keyword list. The reason for outputting the pre-defined keyword list will be explained below and will not be repeated here.
[0173] As explained above, since no differences containing any keywords are output, operators only need to focus on the output differences. The problems corresponding to these differences need to be solved, otherwise it will affect the normal operation of electronic equipment. Operators need to investigate these problems, quickly locate the problems, and perform fault diagnosis and troubleshooting, which improves the efficiency of fault handling.
[0174] In some possible embodiments, to further improve the accuracy of the comparison, after outputting the first or second comparison result and the preset keyword list, the comparison is reviewed again using the keywords to confirm that the differences are correctly output and the comparison result is correct. That is:
[0175] Using a preset keyword list, two log lines or entries with matching results are compared and verified to confirm that the differences between them contain any keyword. If the comparison confirms that the differences between the two log lines or entries still contain any keyword, the log comparison ends and no information is sent. If the comparison confirms that the differences between the two log lines or entries do not contain any keyword, an exception message is sent and displayed to prompt the operator to conduct a manual review, thereby ensuring that the difference output is correct and the comparison result is accurate.
[0176] Similarly, using a preset keyword list, two log lines or log entries with inconsistent comparison results are reviewed to confirm that the differences between them do not contain any keywords. If the review determines that the differences between two log lines or log entries that match are still not found to contain any keywords, the log comparison for these two log lines or log entries ends, and no information is sent. If the review determines that the differences between two log lines or log entries that match contain any keyword, an exception message is sent and displayed to prompt the operator to conduct a manual review, thereby ensuring that the difference output is correct and the comparison result is correct.
[0177] The above-described method for automatic log comparison can be combined with... Figure 2 The overall flowchart shown is summarized as follows:
[0178] Read and parse the log to be compared and the log from the previous time point (i.e., obtain the log to be compared and the log from the previous time point and process them), check if they match (compare the processed log to be compared and the log from the previous time point to determine if there are any differences between them), if they match (i.e. there are no differences), mark them as consistent; if they do not match (i.e. there are differences), continue to the next step: ignore whether the keyword exists (i.e., if there are differences, determine whether the differences contain any keyword in the preset keyword list).
[0179] If a match is found (i.e., if any keyword is present, then differences containing that keyword are ignored), it is marked as consistent; if no match is found (i.e., no keyword is present), proceed to the next step: end the keyword ignoring process.
[0180] Finally, the comparison results (i.e., the first or second comparison result) and a list of preset keywords are output. Alternatively, the number of inconsistent records from the preceding steps can be displayed (i.e., the total number of all discrepancies, regardless of whether the discrepancy contains keywords), and the number of inconsistencies ignored in the preceding steps (i.e., the number of discrepancies containing keywords) can also be displayed. The entire process then ends.
[0181] Furthermore, in some possible embodiments, considering the convenience of managing the preset keyword list, a preferred approach is to manage the preset keyword list based on a feature data structure.
[0182] The feature data structure includes: a module list, an identifier or name, and attribute or status information; wherein, the module list is used to manage all modules in the log; the identifier or name is used to represent the unique identifier of each module in the log; and the attribute or status information is used to represent each keyword in the preset keyword list.
[0183] Refer to the following table for an example feature data structure:
[0184] Module list Identifier or name Attribute or status information BASE desc result memory RAM_1 bank_locator storage HD_1 controller_info CPU cpu_info error_info system system_info architecture
[0185] The module list exemplifies six modules: BASE (motherboard), memory, storage, CPU (central processing unit), and system. Identifiers or names are unique identifiers for these six modules in the log file. For example, the appearance of "desc" in the log file indicates that this line or log entry is related to BASE. Of course, these identifiers or names can be changed and set to other identifiers or names according to your needs.
[0186] The attribute or status information represents the attributes or status of these six modules. For example, `error_info` indicates that an error has occurred in the `cpu` module; it is a keyword. This feature data structure allows for the management of a predefined keyword list and enables further matching of keywords with any module in the logs for easy identification.
[0187] Based on the above-described automatic log comparison method, the present invention also provides an apparatus for automatic log comparison, the apparatus comprising:
[0188] The log acquisition and processing module is used to acquire the log to be compared and the log from the previous time point, and to process the two logs.
[0189] The comparison and difference determination module is used to compare the processed log to be compared with the log at the previous time point to determine whether there are any differences between the two.
[0190] The keyword determination module is used to determine whether the difference point contains any keyword from a preset keyword list if the difference point exists.
[0191] The first output module is used to ignore the differences containing any keyword if any keyword is included, and output the first comparison result and the preset keyword list.
[0192] The second output module is used to output the differences that do not contain any keywords, the second comparison result, and the preset keyword list if no keywords are contained.
[0193] Optionally, the device further includes:
[0194] The acquisition module is used to retrieve historical logs;
[0195] The word segmentation module is used to segment the historical logs to obtain all the words contained in the historical logs.
[0196] The calculation module is used to calculate word frequency and document frequency values based on each word.
[0197] The weighting module is used to perform a comprehensive weighted calculation based on the word frequency value and document frequency value of each word to obtain the weighted value corresponding to each word.
[0198] The sorting and selection module is used to sort all words in descending order according to the weighted value, select multiple words based on preset rules, and form the preset keyword list.
[0199] Optionally, the computing module includes:
[0200] The frequency and total vocabulary units are used to determine the number of times a target word appears in a historical log, and to determine the total number of words in that historical log;
[0201] The word frequency calculation unit is used to divide the number of occurrences by the total number of words to obtain the word frequency value;
[0202] The total number and correction number units are used to determine the total number of historical logs and the correction number of historical logs containing the target vocabulary;
[0203] The document frequency value calculation unit is used to divide the total number by the corrected number to obtain the document frequency value;
[0204] The correction quantity is obtained by summing the number of historical logs containing the target vocabulary with 1.
[0205] Optionally, the weighting module includes:
[0206] The product unit is used to multiply the word frequency value of the target word with the document frequency value of the target word to obtain the product value;
[0207] The weighting value determination unit is used to determine the weighting value corresponding to the target word based on the target device model, wherein the target device model is the device model corresponding to the device that generates historical logs containing the target word.
[0208] A weighting unit is used to assign weights to the product value according to the weighting value to obtain the weighted value corresponding to the target word.
[0209] Optionally, the sorting and selection module is specifically used for:
[0210] All words are sorted in descending order according to their weighted values, and words with weighted values greater than a preset value are selected to form the preset keyword list; or...
[0211] All words are sorted in descending order according to their weighted values, and a preset percentage of words with the highest weighted values are selected to form the preset keyword list.
[0212] Optionally, the log acquisition and processing module is specifically used for:
[0213] Using the system and application interfaces, log files are periodically and automatically collected and uploaded to a preset storage location;
[0214] Based on timestamp technology, log files in the preset storage location are traversed to determine the log to be compared and the log at the previous time point.
[0215] The content of the log to be compared is converted into the corresponding first data structure;
[0216] The contents of the log from the previous time point are converted into the corresponding second data structure.
[0217] Optionally, the comparison and difference determination module is specifically used for:
[0218] Based on the first data structure, the log to be compared is divided into log lines or log entries using a preset comparison function;
[0219] Based on the second data structure, the preset comparison function is used to divide the log from the previous time point into log lines or log entries;
[0220] For each log line segmented from the log to be compared, compare it line by line with the log line segmented from the previous time point; or,
[0221] Each log entry segmented from the log to be compared is compared with each log entry segmented from the log at the previous time point.
[0222] One method of comparing each entry individually includes: comparing the entire log data of the two target log entries; or,
[0223] Compare two target log entries based on specific fields.
[0224] Optionally, the device further includes:
[0225] The verification module is used to verify two log lines or log entries with the same comparison result by using the preset keyword list, and to confirm again that the difference between the two log lines or log entries with the same comparison result contains any of the keywords.
[0226] The sending and display module is used to send no information if the difference between two log lines or two log entries that are consistent with the comparison results after re-verification still contains any of the keywords, and to send and display abnormal information if the difference between two log lines or two log entries that are consistent with the comparison results after re-verification does not contain any of the keywords, so as to prompt the operator to perform manual verification.
[0227] The verification module is also used to use the preset keyword list to verify two log lines or two log entries with inconsistent comparison results, and to confirm again that the difference between the two log lines or two log entries with inconsistent comparison results does not contain any of the keywords.
[0228] If, upon further review, it is determined that the differences between two log lines or two log entries with inconsistent comparison results still do not contain any of the aforementioned keywords, then no information will be sent.
[0229] The sending and display module is also used to send and display the abnormal information if, after re-verification, it is determined that the difference between two log lines or two log entries with inconsistent comparison results contains any of the keywords, so as to prompt the operator to conduct a manual review.
[0230] Based on the above-described automatic log comparison method, the present invention also provides an electronic device, which uses any of the above-described automatic log comparison methods to automatically compare the logs it generates.
[0231] In summary, the automatic log comparison method proposed in this invention first obtains the log to be compared and the log at the previous time point, and processes the two logs; then, it compares the processed log to be compared with the log at the previous time point to determine whether there are any differences between them.
[0232] If a difference exists, determine whether the difference contains any keyword from the preset keyword list; if it contains any keyword, ignore the difference containing that keyword and output the first comparison result and the preset keyword list; if it does not contain any keyword, output the difference that does not contain any keyword, the second comparison result, and the preset keyword list.
[0233] The automatic log comparison method proposed in this invention creatively ignores differences when keywords are present and only outputs differences when keywords are absent. Since differences caused by keywords are not faults that operators need to focus on, operators only need to check the output differences to quickly locate the problem and perform fault diagnosis and troubleshooting, improving the efficiency of fault handling. Furthermore, it eliminates the need for manual configuration of relevant rules or patterns for matching and comparison, avoiding errors and omissions in manual comparison, significantly improving the efficiency and accuracy of log comparison, and providing a convenient method for system monitoring and fault diagnosis, demonstrating high practicality.
[0234] Although preferred embodiments of the present invention have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of the embodiments of the present invention.
[0235] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or terminal device that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or terminal device. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or terminal device that includes said element.
[0236] The embodiments of the present invention have been described above with reference to the accompanying drawings. However, the present invention is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of the present invention without departing from the spirit and scope of the claims. All of these forms are within the protection scope of the present invention.
Claims
1. A method of log automatic comparison, characterized by, The method includes: Obtain historical logs; perform word segmentation on the historical logs to obtain all words contained in the historical logs; calculate word frequency and document frequency for each word; perform a comprehensive weighted calculation based on the word frequency and document frequency for each word to obtain a weighted value for each word; sort all words in descending order according to the weighted values, and select multiple words based on preset rules to form a preset keyword list; Obtain the log to be compared and the log from the previous time point, and process the two logs; The processed log to be compared is compared with the log from the previous time point to determine whether there are any differences between them. If no discrepancy is found, the result of matching the two log lines or log entries will be directly output. If the discrepancy point exists and the discrepancy point contains any keyword, then output the first comparison result where the two log lines or two log entries containing the discrepancy point are consistent, and simultaneously output the preset keyword list; If the discrepancy point exists and does not contain any of the keywords, then the second comparison result of the two log lines or two log entries containing the discrepancy point is output, and the discrepancy point and the preset keyword list are also output.
2. The method according to claim 1, characterized in that, For each word, word frequency and document frequency are calculated, including: Determine the number of times the target word appears in a historical log, and determine the total number of words in that historical log; The frequency value is obtained by dividing the number of occurrences by the total number of words. Determine the total number of historical logs, and determine the number of corrections for historical logs containing the target vocabulary; The document frequency value is obtained by dividing the total number by the corrected number. The correction quantity is obtained by summing the number of historical logs containing the target vocabulary with 1.
3. The method according to claim 1, characterized in that, Based on the word frequency and document frequency of each word, a comprehensive weighted calculation is performed to obtain the weighted value for each word, including: The word frequency value of the target word is multiplied by the document frequency value of the target word to obtain the product value; Based on the target device model, determine the weighting value corresponding to the target term, where the target device model is the device model corresponding to the device that generates historical logs containing the target term; The product value is weighted according to the weighting value to obtain the weighted value corresponding to the target word.
4. The method according to claim 1, characterized in that, All words are sorted in descending order according to their weighted values. Multiple words are selected based on preset rules to form the preset keyword list, including: All words are sorted in descending order according to their weighted values, and words with weighted values greater than a preset value are selected to form the preset keyword list; or... All words are sorted in descending order according to their weighted values, and a preset percentage of words with the highest weighted values are selected to form the preset keyword list.
5. The method according to claim 1, characterized in that, Obtain the log to be compared and the log from the previous time point, and process the two logs, including: Using the system and application interfaces, log files are periodically and automatically collected and uploaded to a preset storage location; Based on timestamp technology, log files in the preset storage location are traversed to determine the log to be compared and the log at the previous time point. The content of the log to be compared is converted into the corresponding first data structure; The contents of the log from the previous time point are converted into the corresponding second data structure.
6. The method according to claim 5, characterized in that, The processed log to be compared is compared with the log from the previous time point, including: Based on the first data structure, the log to be compared is divided into log lines or log entries using a preset comparison function; Based on the second data structure, the preset comparison function is used to divide the log from the previous time point into log lines or log entries; For each log line segmented from the log to be compared, compare it line by line with the log line segmented from the previous time point; or, Each log entry segmented from the log to be compared is compared with each log entry segmented from the log at the previous time point. One method of comparing each entry individually includes: comparing the entire log data of the two target log entries; or, Compare two target log entries based on specific fields.
7. The method according to claim 6, characterized in that, After outputting the preset keyword list, the method further includes: Using the preset keyword list, two log lines or log entries with the same comparison result are reviewed to confirm that the difference between the two log lines or log entries with the same comparison result contains any of the keywords. If, upon re-verification, it is determined that the differences between two log lines or two log entries that match the comparison results still contain any of the aforementioned keywords, then no information will be sent. If a second review confirms that the difference between two log lines or two log entries with the same comparison result does not contain any of the keywords, an exception message will be sent and displayed to prompt the operator to conduct a manual review. Using the preset keyword list, the two log lines or log entries with inconsistent comparison results are reviewed to confirm that the differences between the two log lines or log entries with inconsistent comparison results do not contain any of the keywords. If, upon further review, it is determined that the differences between two log lines or two log entries with inconsistent comparison results still do not contain any of the aforementioned keywords, then no information will be sent. If a second review confirms that the difference between two log lines or two log entries with inconsistent comparison results contains any of the aforementioned keywords, then the exception information is sent and displayed to prompt the operator to conduct a manual review.
8. The method according to claim 1, characterized in that, The preset keyword list is managed based on a feature data structure. The feature data structure includes: a module list, identifiers or names, and attribute or status information; The module list is used to manage all modules in the log; The identifier or name is used to uniquely identify each module in the log; The attribute or status information is used to characterize each keyword in the preset keyword list.
9. A device for automatic log comparison, characterized in that, The device includes: The acquisition module is used to retrieve historical logs; The word segmentation module is used to segment the historical logs to obtain all the words contained in the historical logs. The calculation module is used to calculate word frequency and document frequency values based on each word. The weighting module is used to perform a comprehensive weighted calculation based on the word frequency value and document frequency value of each word to obtain the weighted value corresponding to each word. The sorting and selection module is used to sort all words in descending order according to the weighted value, select multiple words based on preset rules, and form a preset keyword list. The log acquisition and processing module is used to acquire the log to be compared and the log from the previous time point, and to process the two logs. The comparison and difference determination module is used to compare the processed log to be compared with the log at the previous time point to determine whether there are any differences between the two. The keyword determination module is used to determine whether the difference point contains any keyword from a preset keyword list if the difference point exists; if the difference point does not exist, the result of matching the two log lines or two log entries is directly output. The first output module is used to output a first comparison result in which the two log lines or two log entries containing the difference point match if the difference point exists and the difference point contains any keyword, and at the same time output the preset keyword list. The second output module is used to output a second comparison result if the difference point exists and the difference point does not contain any of the keywords, indicating that the two log lines or two log entries containing the difference point are inconsistent, and simultaneously outputs the difference point and the preset keyword list.
10. An electronic device, characterized in that, The electronic device uses the automatic log comparison method as described in any one of claims 1 to 8 to automatically compare the logs it generates.