A method, apparatus and device for generating a regular expression

By generating formatted information from raw logs and using AI agents for multiple rounds of optimization, the problem of relying on human experience for regular expression generation in existing technologies has been solved. This achieves efficient and accurate regular expression generation, improving the automation and quality of log analysis.

CN122196244APending Publication Date: 2026-06-12FENGLING CHUANGJING (BEIJING) TECH CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
FENGLING CHUANGJING (BEIJING) TECH CO LTD
Filing Date
2026-03-20
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing regular expression generation methods rely on human experience, have low automation, low efficiency and insufficient accuracy, cannot adapt to new log formats, and require a lot of manual proofreading of the generated results, making it impossible to achieve end-to-end automation and efficient and accurate log analysis.

Method used

By generating formatted information from raw logs, multiple rounds of optimization operations are performed based on an AI agent to generate an initial regular expression. The accuracy is gradually improved through multiple rounds of generation and optimization operations, ultimately generating a high-quality regular expression, including structured processing, semantic recognition, and multi-round matching optimization.

🎯Benefits of technology

It achieves end-to-end automated generation of regular expressions, reducing labor costs, improving the accuracy and timeliness of log analysis, and ensuring the quality of generated regular expressions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122196244A_ABST
    Figure CN122196244A_ABST
Patent Text Reader

Abstract

Embodiments of the present application provide a regular expression generation method, device and equipment, and relate to the technical field of log analysis, to improve the quality and generation efficiency of the generated regular expression. In the method, an initial regular expression corresponding to an original log is generated based on the formatted information of the original log according to a preset regular expression generation mode; the initial regular expression is taken as a first-round regular expression, and a plurality of rounds of generation optimization operations are performed to obtain a final regular expression corresponding to the original log. One round of generation optimization operation includes: matching the current round information extraction result with the original log to obtain a current round verification result of the current round regular expression; when the current round verification result is passed, the current round regular expression is taken as the final regular expression; and when the current round verification result is not passed, a next round regular expression corresponding to the original log is generated based on the current round matching failure field information, the original log and the current round regular expression.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of log analysis technology, and in particular to a method, apparatus and device for generating regular expressions. Background Technology

[0002] With the deepening of enterprise digital transformation and the rapid development of cloud computing technology, various software systems generate massive amounts of log data during daily operation. As a key carrier for recording system operating status, user behavior, and security events, log analysis and processing have become a core aspect of intelligent operations and maintenance. By automating log parsing and monitoring, enterprises can grasp the system's health status in real time, quickly locate the root cause of failures, and effectively prevent potential risks, thereby ensuring business continuity and stability. In log parsing and monitoring scenarios, regular expressions, as a core tool for pattern matching, are widely used to extract key information (such as error codes, service IP addresses, service latency, etc.) from unstructured raw logs.

[0003] In existing technologies, regular expressions are generated in the following ways: (1) Based on human experience: Engineers analyze the original log samples line by line, identify log patterns based on their personal experience, and manually write regular expressions. This process requires multiple iterations of "writing, testing, and debugging" until the regular expression can accurately match the target log format; (2) Using a predefined pattern library: Some log collection tools provide a predefined regular expression pattern library, which includes regular expression patterns for common log elements (such as IP address, timestamp, etc.). Engineers can construct regular expressions by combining these regular expression patterns; (3) Cluster-based automated solutions: Automated solutions (such as Drain algorithm and Spell algorithm, etc.) automatically discover structural patterns from a large amount of log data through text clustering and template induction, forming a general log template.

[0004] However, the above methods have obvious limitations: (1) The method based on human experience is highly dependent on the professional skills and experience of engineers, and the degree of automation is limited, resulting in high cost and low efficiency; (2) Although the method of using a predefined pattern library can improve efficiency, it lacks adaptability to new log formats, resulting in insufficient matching accuracy of the generated regular expressions; (3) The automation scheme based on clustering generates rough results, with insufficient template accuracy, lack of precise definition of variable field boundaries, and no ability to name capture groups. Summary of the Invention

[0005] This application provides a method, apparatus, and device for generating regular expressions to improve the quality and efficiency of generated regular expressions, thereby improving the accuracy and timeliness of log analysis.

[0006] In a first aspect, embodiments of this application provide a method for generating regular expressions, the method comprising: Generate formatted information corresponding to the original log, and generate the initial regular expression corresponding to the original log based on the formatted information according to the preset regular expression generation method. The formatted information includes: the generated feature information of each field, and the generated feature information includes: capture group expression. The initial regular expression is used as the first-round regular expression. Multiple rounds of generation and optimization operations are performed to obtain the final regular expression corresponding to the original log. One round of generation and optimization operations includes: The current round regular expression is used to extract information from the original log, and the current round information extraction result is obtained. The current round information extraction result is then matched with the original log to obtain the current round validation result of the current round regular expression. If the current round of validation results in a pass, the current round's regular expression will be used as the final regular expression. When the current round of validation fails, collect the information of the fields that failed to match in the current round from the original log, and generate the regular expression for the next round corresponding to the original log based on the information of the fields that failed to match in the current round, the original log, and the regular expression of the current round.

[0007] In one optional embodiment, based on the current round's failed matching field information, the original log, and the current round's regular expression, the next round's regular expression is generated, including: Based on the field information of the current round of matching failure, an optimized suggestion word is generated, and the optimized suggestion word, the original log and the current round regular expression are input into the AI ​​agent to obtain the next round of formatted information corresponding to the original log. Based on the formatting information of the next round, generate the next round of regular expressions corresponding to the original logs according to the regular expression generation method.

[0008] In one optional embodiment, generating formatted information corresponding to the original log includes: The raw logs are processed to obtain structured logs; The structured logs and preset prompts are input into the AI ​​agent, which performs semantic recognition and structural analysis on the structured logs to obtain formatted information.

[0009] In one alternative embodiment, the AI ​​agent has a pre-defined knowledge base built-in; The preset knowledge base includes: a first knowledge base and a second knowledge base. The first knowledge base is used to store the mapping relationship between historical log samples and regular expression samples, and the second knowledge base is used to store the capture group expressions and semantic mapping rules of each standard field.

[0010] In one optional embodiment, the original log is processed into a structured log to obtain a structured log, including: Obtain the raw logs and preprocess them to obtain the preprocessed logs; Placeholders are used to mark key fields in the preprocessed log, and the context information of each field in the preprocessed log is preserved to obtain a structured log.

[0011] In an optional embodiment, the feature information further includes: field name, field type, field order, semantic label, prefix, suffix, and field separator information.

[0012] In one optional embodiment, an initial regular expression corresponding to the original log is generated based on the formatting information according to a preset regular expression generation method, including: The capture group expression for each field is encapsulated into a named capture group format, and the capture group expression for each field is concatenated in the order of the fields to obtain the regular expression to be improved. Based on the prefix, suffix, and field separator information of each field, characters are inserted into the regular expression to be improved to obtain the initial regular expression.

[0013] In one optional embodiment, after using the initial regular expression as the first-round regular expression and performing multiple rounds of generation optimization operations to obtain the final regular expression corresponding to the original log, the method further includes: Display the final regular expression to the target object; When a confirmation instruction for the final regular expression is received from the target object, a first mapping relationship is constructed between the original log and the final regular expression, and the first mapping relationship is stored in the first knowledge base; When a modification instruction for the final regular expression of the target object is received, a second mapping relationship is constructed between the original log and the final regular expression modified by the target object, and the second mapping relationship is stored in the first knowledge base.

[0014] Secondly, embodiments of this application also provide a regular expression generation apparatus, the apparatus comprising: The initial generation module is used to generate formatted information corresponding to the original logs, and generate the initial regular expression corresponding to the original logs based on the formatted information according to the preset regular expression generation method. The formatted information includes: the generated feature information of each field, and the generated feature information includes: capture group expression. The optimization generation module is used to take the initial regular expression as the first-round regular expression and perform multiple rounds of generation optimization operations to obtain the final regular expression corresponding to the original log. One round of generation optimization operations includes: The current round regular expression is used to extract information from the original log, and the current round information extraction result is obtained. The current round information extraction result is then matched with the original log to obtain the current round validation result of the current round regular expression. If the current round of validation results in a pass, the current round's regular expression will be used as the final regular expression. When the current round of validation fails, collect the information of the fields that failed to match in the current round from the original log, and generate the regular expression for the next round corresponding to the original log based on the information of the fields that failed to match in the current round, the original log, and the regular expression of the current round.

[0015] In an optional embodiment, when generating the next round's regular expression corresponding to the original log based on the current round's failed matching field information, the original log, and the current round's regular expression, the optimization generation module is further configured to: Based on the field information of the current round of matching failure, an optimized suggestion word is generated, and the optimized suggestion word, the original log and the current round regular expression are input into the AI ​​agent to obtain the next round of formatted information corresponding to the original log. Based on the formatting information of the next round, generate the next round of regular expressions corresponding to the original logs according to the regular expression generation method.

[0016] In an optional embodiment, when generating the formatted information corresponding to the original log, the initial generation module is further configured to: The raw logs are processed to obtain structured logs; The structured logs and preset prompts are input into the AI ​​agent, which performs semantic recognition and structural analysis on the structured logs to obtain formatted information.

[0017] In one alternative embodiment, the AI ​​agent has a pre-defined knowledge base built-in; The preset knowledge base includes: a first knowledge base and a second knowledge base. The first knowledge base is used to store the mapping relationship between historical log samples and regular expression samples, and the second knowledge base is used to store the capture group expressions and semantic mapping rules of each standard field.

[0018] In an optional embodiment, when the original log is processed into a structured log, the initial generation module is further configured to: Obtain the raw logs and preprocess them to obtain the preprocessed logs; Placeholders are used to mark key fields in the preprocessed log, and the context information of each field in the preprocessed log is preserved to obtain a structured log.

[0019] In an optional embodiment, the feature information further includes: field name, field type, field order, semantic label, prefix, suffix, and field separator information.

[0020] In an optional embodiment, when generating the initial regular expression corresponding to the original log based on the formatting information according to a preset regular expression generation method, the initial generation module is further configured to: The capture group expression for each field is encapsulated into a named capture group format, and the capture group expression for each field is concatenated in the order of the fields to obtain the regular expression to be improved. Based on the prefix, suffix, and field separator information of each field, characters are inserted into the regular expression to be improved to obtain the initial regular expression.

[0021] In an optional embodiment, the apparatus further includes a feedback module, the feedback module being used for: Display the final regular expression to the target object; When a confirmation instruction for the final regular expression is received from the target object, a first mapping relationship is constructed between the original log and the final regular expression, and the first mapping relationship is stored in the first knowledge base; When a modification instruction for the final regular expression of the target object is received, a second mapping relationship is constructed between the original log and the final regular expression modified by the target object, and the second mapping relationship is stored in the first knowledge base.

[0022] Thirdly, embodiments of this application also provide an electronic device, including: Processor; and Stored program memory, The program includes instructions that, when executed by the processor, cause the processor to perform the method for generating regular expressions as described in the first aspect.

[0023] Fourthly, embodiments of this application also provide a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the regular expression generation method as described in the first aspect.

[0024] Fifthly, this application provides a computer program product that, when invoked by a computer, causes the computer to execute the regular expression generation method steps as described in the first aspect.

[0025] The beneficial effects of this application are as follows: In the regular expression generation method provided in this application embodiment, formatted information corresponding to the original log is first generated. Then, based on the formatted information and a preset regular expression generation method, an initial regular expression corresponding to the original log is generated. The formatted information includes generation feature information for each field, and the generation feature information includes capture group expressions. Next, the initial regular expression is used as the first-round regular expression, and multiple rounds of generation optimization operations are performed to obtain the final regular expression corresponding to the original log. Each round of generation optimization includes: extracting information from the original log using the current-round regular expression to obtain the current-round information extraction result, and matching the current-round information extraction result with the original log to obtain the current-round verification result of the current-round regular expression. If the current-round verification result is successful, the current-round regular expression is used as the final regular expression. If the current-round verification result is unsuccessful, information on the currently matched but failed fields in the original log is collected, and based on the currently matched but failed field information, the original log, and the current-round regular expression, the next-round regular expression corresponding to the original log is generated. This approach overcomes the heavy reliance on human expertise in traditional log rule writing. By generating formatted information corresponding to the original logs and then using this formatted information to generate initial regular expressions, the entire process eliminates the need for engineers to manually write complex regular expressions. This achieves end-to-end automation of log regular expression matching rule generation, significantly reducing labor costs and technical barriers, and improving the efficiency and stability of log monitoring rule generation. At the same time, multiple rounds of generation and optimization operations ensure that the accuracy of the regular expressions is gradually improved, avoiding the inaccurate matching problems that may result from one-time generation. This guarantees the quality of the final regular expression, thereby improving the accuracy and timeliness of log analysis.

[0026] Furthermore, other features and advantages of this application will be set forth in the following description and will be apparent in part from the description, or may be learned by practicing the application. The objectives and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings. Attached Figure Description

[0027] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described herein are used to provide a further understanding of this application, constitute a part of this application, and do not constitute an improper limitation of this application. In the accompanying drawings: Figure 1 This is a schematic diagram of an optional system architecture applicable to the embodiments of this application.

[0028] Figure 2 This is a schematic diagram illustrating the implementation process of a regular expression generation method provided in this application embodiment.

[0029] Figure 3 This is a schematic diagram illustrating another implementation of a regular expression generation method provided in this application.

[0030] Figure 4 This is a schematic diagram of a regular expression generation device provided in an embodiment of this application.

[0031] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0032] Embodiments of this application will now be described in more detail with reference to the accompanying drawings. While some embodiments of this application are shown in the drawings, it should be understood that this application can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this application. It should be understood that the drawings and embodiments of this application are for illustrative purposes only and are not intended to limit the scope of protection of this application.

[0033] It should be understood that the steps described in the method embodiments of this application may be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of this application is not limited in this respect.

[0034] The term "comprising" and its variations as used herein are open-ended, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Definitions of other terms will be given in the following description. It should be noted that the concepts of "first", "second", etc., mentioned in this application are used only to distinguish different devices, modules, or units, and are not intended to limit the order of functions performed by these devices, modules, or units or their interdependencies.

[0035] It should be noted that the terms "a" and "a plurality of" used in this application are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0036] The names of the messages or information exchanged between multiple devices in the embodiments of this application are for illustrative purposes only and are not intended to limit the scope of these messages or information.

[0037] The following explanations of some terms used in the embodiments of this application are provided to facilitate understanding by those skilled in the art.

[0038] (1) Regular expression: A regular expression is a string composed of specific characters and sequences used to describe and match a series of texts that conform to a certain syntax rule. In log analysis scenarios, it is used to match and extract key information with common patterns (such as error codes, service IP addresses, service time, etc.) from massive, unstructured raw log data. It is the core tool for realizing log structuring and information filtering.

[0039] (2) AI Agent: This refers to an intelligent agent based on a large language model. Through a built-in knowledge base, task workflow, and prompt word templates, it provides targeted guidance for the reasoning and generation capabilities of the large language model. The AI ​​Agent has semantic understanding, multi-step reasoning, and automated execution functions. It can perform semantic recognition and structured analysis on input log samples, generate a standardized JSON-formatted schema, and provide support for the system to automatically assemble log regular expressions.

[0040] (3) Capture Group: This is a syntax structure in regular expressions, formed by enclosing a subpattern in parentheses (). In log processing, the core function of capture groups is to precisely extract the specific values ​​of variable fields (such as username, transaction ID, and response time) from matched log lines. These extracted values ​​can be referenced and stored individually, providing a structured data foundation for subsequent log fieldization, indicator calculation, and statistical analysis.

[0041] Based on the above explanations of terms and related terminology, the design concept of the embodiments of this application will be briefly introduced below: With the deepening of enterprise digital transformation and the rapid development of cloud computing technology, various software systems generate massive amounts of log data during daily operation. As a key carrier for recording system operating status, user behavior, and security events, log analysis and processing have become a core aspect of intelligent operations and maintenance. By automating log parsing and monitoring, enterprises can grasp the system's health status in real time, quickly locate the root cause of failures, and effectively prevent potential risks, thereby ensuring business continuity and stability. In log parsing and monitoring scenarios, regular expressions, as a core tool for pattern matching, are widely used to extract key information (such as error codes, service IP addresses, service latency, etc.) from unstructured raw logs.

[0042] In the existing technology, the generation of regular expressions includes the following methods: (1) Based on human experience: Engineers analyze the original log samples line by line, identify log patterns based on personal experience, and manually write regular expressions. This process requires multiple iterations of "writing, testing, and debugging" until the regular expression can accurately match the target log format; (2) Use a predefined pattern library: Some log collection tools provide a predefined regular expression pattern library, which includes regular expression patterns for common log elements (such as IP address, timestamp, etc.). Engineers do not need to write complex regular expressions from scratch, but can construct regular expressions by referencing and combining these pattern blocks; (3) Cluster-based automation schemes: Automation schemes (such as Drain algorithm and Spell algorithm, etc.) automatically discover structural patterns from a large amount of log data through text clustering and template induction, forming a general log template. Its basic principle is as follows: By analyzing the similarity between log texts, logs with similar formats are divided into the same category, and the common parts with higher frequency of occurrence are used as "fixed fields", and wildcards are used to represent the variable parts, thus forming a general log template. Although a certain degree of automation is achieved, the generated results are usually rough and cannot be directly used in the production environment.

[0043] The above methods have obvious limitations: (1) The method based on human experience is highly dependent on the professional skills and experience of engineers, with limited automation, resulting in high cost and low efficiency; (2) Although the method using a predefined pattern library can improve efficiency, it lacks adaptability to new log formats, resulting in insufficient matching accuracy of the generated regular expressions, and it cannot achieve one-click generation from log samples to directly usable regular expressions and capture groups, with insufficient automation; (3) The automation scheme based on clustering generates coarse results, with insufficient template accuracy, lack of precise definition of variable field boundaries, and no ability to name capture groups. In actual application, the generated results still need to be tediously proofread, converted into standard regular expressions, and have meaningful field names manually added, so the efficiency improvement brought by automation is limited. None of the above methods can achieve one-click generation from log samples to directly usable regular expressions and capture groups, and the generated results often require a lot of manual post-processing, which is difficult to meet the dual requirements of log parsing efficiency and quality in modern operation and maintenance scenarios.

[0044] In view of this, this application provides a method for generating regular expressions, which may specifically include: generating formatted information corresponding to the original log, and generating an initial regular expression corresponding to the original log based on the formatted information according to a preset regular expression generation method, wherein the formatted information includes: generation element information for each field, and the generation element information includes: capture group expression; then using the initial regular expression as the first round regular expression, performing multiple rounds of generation optimization operations to obtain the final regular expression corresponding to the original log, wherein each round of generation optimization operation includes: using the current round regular expression to extract information from the original log, obtaining the current round information extraction result, and matching the current round information extraction result with the original log to obtain the current round verification result of the current round regular expression; when the current round verification result is successful, using the current round regular expression as the final regular expression; when the current round verification result is unsuccessful, collecting the current round matching failure field information in the original log, and generating the next round regular expression corresponding to the original log based on the current round matching failure field information, the original log, and the current round regular expression. This approach overcomes the heavy reliance on human expertise in traditional log rule writing. By generating formatted information corresponding to the original logs and then using this formatted information to generate initial regular expressions, the entire process eliminates the need for engineers to manually write complex regular expressions. This achieves end-to-end automation of log regular expression matching rule generation, significantly reducing labor costs and technical barriers, and improving the efficiency and stability of log monitoring rule generation. At the same time, multiple rounds of generation and optimization operations ensure that the accuracy of the regular expressions is gradually improved, avoiding the inaccurate matching problems that may result from one-time generation. This guarantees the quality of the final regular expression, thereby improving the accuracy and timeliness of log analysis.

[0045] In particular, the preferred embodiments of this application will be described below with reference to the accompanying drawings. It should be understood that the preferred embodiments described herein are for illustration and explanation only and are not intended to limit this application. Furthermore, the embodiments of this application and the features in the embodiments can be combined with each other without conflict.

[0046] See Figure 1The diagram illustrates an optional system architecture applicable to an embodiment of this application. This system architecture may include: terminal devices (101a, 101b) and server 102. The terminal devices (101a, 101b) and server 102 can interact via a communication network. The communication network may employ wireless communication or wired communication methods. For example, the terminal devices (101a, 101b) can access the network and communicate with server 102 via cellular mobile communication technology. This cellular mobile communication technology may include, for example, 5G (5th generation mobile networks) or next-generation mobile communication technology. Optionally, the terminal devices (101a, 101b) can access the network and communicate with server 102 via short-range wireless communication. This short-range wireless communication method may include, for example, wireless fidelity (Wi-Fi) technology.

[0047] This application embodiment does not impose any limitation on the number of communication devices involved in the above system architecture. For example, the above system architecture may include more terminal devices, or it may include fewer terminal devices, or it may also include other network devices. Figure 1 As shown, only terminal devices (101a, 101b) and server 102 are described as examples. The following is a brief introduction to the above communication devices and their respective functions.

[0048] A terminal device (101a, 101b) is a device that can provide voice and / or data connectivity to a user, and may be a device that supports wired and / or wireless connections.

[0049] For example, terminal devices (101a, 101b) may include, but are not limited to: mobile phones, tablets, laptops, handheld computers, mobile internet devices (MID), wearable devices, virtual reality (VR) devices, augmented reality (AR) devices, wireless terminal devices in industrial control, wireless terminal devices in autonomous driving, wireless terminal devices in smart grids, wireless terminal devices in transportation safety, wireless terminal devices in smart cities, or wireless terminal devices in smart homes, etc.

[0050] In addition, the terminal devices (101a, 101b) may have related clients installed. The client may be software, such as an application (APP), browser, short video software, etc., or a webpage, mini-program, etc. It should be noted that the terminal devices (101a, 101b) in this application embodiment may be clients related to the generation of regular expressions.

[0051] Server 102 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.

[0052] Optionally, an AI agent can be deployed on server 102. When generating the initial regular expression, server 102 obtains the structured log corresponding to the original log, and then directly inputs the structured log and preset prompt words into the AI ​​agent to obtain the formatted information of the original log. During multi-round generation optimization operations, after obtaining the information of the fields where the current round of matching failed, server 102 generates optimization prompt words based on this information, and then directly inputs the optimization prompt words, the original log, and the current round's regular expression into the AI ​​agent to obtain the formatted information for the next round corresponding to the original log. The aforementioned AI agent can be an intelligent agent for various large language models (e.g., GPT-4, CodeLlama), and this application does not limit its scope.

[0053] The following describes the method for generating regular expressions provided by exemplary embodiments of this application in conjunction with the above-described system architecture and with reference to the accompanying drawings. It should be noted that the above-described system architecture is only shown to facilitate understanding of the spirit and principles of this application, and the embodiments of this application are not limited in any way in this respect.

[0054] See Figure 2 The diagram illustrates the implementation flow of a regular expression generation method provided in this application embodiment. Taking a server as an example, the specific implementation flow of this method is as follows: S20: Generate formatted information corresponding to the original log, and generate the initial regular expression corresponding to the original log based on the formatted information according to the preset regular expression generation method.

[0055] The formatting information includes: generated feature information for each field, which includes: capture group expression, field name, field type, field order, semantic labels, prefixes, suffixes, and field delimiter information. The generated feature information is the feature information used to generate the regular expression.

[0056] In this embodiment of the application, semantic recognition and structuring processing are performed on the original log to obtain the formatted information corresponding to the original log.

[0057] Optionally, in this application embodiment, a possible implementation is provided for generating formatted information corresponding to the original log, specifically by performing the following operations: S200: Perform structured processing on the raw logs to obtain structured logs.

[0058] Optionally, in this embodiment of the application, a possible implementation is provided for structuring the original logs to obtain structured logs, specifically by performing the following operations: S2000: Obtain the raw log and preprocess it to obtain the preprocessed log.

[0059] In this embodiment of the application, when obtaining the original log, the log path and target service instance information input by the user are first obtained, and the validity of the log path and the health status of the target service instance are verified. When the log path is valid and the target service instance is healthy, the log file is collected from the target service instance. Then, it is determined whether the log file does not exist or is empty. If the log file does not exist or is empty, the user is prompted to manually input the original log. If the log file exists and is not empty, a preset number of lines (e.g., 20 lines) of log are sampled from the end of the file by default as the original log. At this time, if the total number of lines in the log file is less than the preset number of lines, the entire log file is fully collected to obtain the original log.

[0060] The process of validating a log path involves performing permission and format verification. If both verifications pass, the log path is considered valid. Specifically, permission verification checks file permission bits or attempts to open the file handle. If permissions are insufficient, the verification fails, prompting the user to adjust permissions. Format verification checks if the log path's syntax conforms to operating system specifications. If it does not, the format verification fails, prompting the user to re-enter a new log path.

[0061] In this embodiment of the application, the preprocessing operations include: log segmentation, redundant space removal, and preliminary extraction of key fields (such as timestamp, log level, IP address, number, etc.).

[0062] S2001: Placeholders are used to mark key fields in the preprocessed log, and the context information of each field in the preprocessed log is retained to obtain a structured log.

[0063] In this embodiment of the application, placeholders are used to mark key fields in the preprocessed log, such as replacing the timestamp with... <time>Numbers replaced with <num>IP address replaced with <ip>At the same time, the context information of each field contained in the preprocessed log is preserved (such as user=, cost=, etc.).

[0064] For example, assuming the preprocessed log is: 2025-10-29 10:00:00 INFO user=001 ip=192.168.1.1 action=login cost=10ms status=success, then the structured log would be: <time>INFOuser= <num>ip= <ip>action=login cost= <num>ms status=success.

[0065] In this way, (1) noise is reduced: the original log may contain redundant information (such as irregular spaces and irrelevant text), which will increase the difficulty of semantic recognition. The preprocessing step cleans up this noise and makes the sample more regular. (2) semantic focus is enhanced: the structured log highlights the pattern structure of the log through placeholders, which helps the AI-Agent focus on field type and relationship recognition, rather than specific values. (3) efficiency is improved: the structured sample simplifies the workflow of the AI-Agent and accelerates the generation of formatted information.

[0066] S201: Input the structured log and preset prompts into the AI ​​agent, perform semantic recognition and structural analysis on the structured log, and obtain formatted information.

[0067] In this embodiment, the formatting information is a standardized schema in JSON format. In the formatting information, each field is defined as an independent object, containing the generated feature information for each field. The generated feature information includes: field name (field_name), field type (field_type), capture group expression (capture_pattern), field order (field_order), semantic label (semantic_label), prefix (prefix), suffix (suffix), and field delimiter information (delimiter).

[0068] Here is an example of how a field in the formatted information generates feature information: {"field_name": "timestamp", "field_type": "datetime", "capture_pattern": "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}", "field_order": 1, "semantic_label": "timestamp", "prefix": "", "suffix": " ", "delimiter": " "}.

[0069] In this embodiment, the AI ​​agent has a built-in pre-defined knowledge base, which includes a first knowledge base and a second knowledge base. The first knowledge base stores the mapping relationship between historical log samples and regular expression samples, providing a reference for the AI ​​agent when parsing logs to generate formatted information. The second knowledge base stores the capture group expressions and semantic mapping rules for each standard field, providing a standardized basis for field-level expression generation. The standard fields include timestamps, log levels, IP addresses, module names, response times, etc. The AI ​​agent also has a built-in formatted information generation workflow, which defines the entire process logic of the AI ​​agent from input to output when parsing logs, including steps such as prompt word understanding, field recognition, semantic aggregation, capture group selection, and formatting.

[0070] In this embodiment, the structured logs and prompts are input into the AI ​​agent by calling the API interface. Based on its internal multiple preset knowledge bases and workflows, the AI ​​agent performs semantic recognition and field mapping on the input structured logs, comprehensively judges the logical hierarchy of the log structure and the relationship between fields, and automatically generates formatted information.

[0071] In this way, the raw logs are first structured, converting the non-standardized raw logs into a uniform format, cleaning up noise, and making them easier for the AI ​​agent to recognize. This aims to provide the AI ​​agent with "clean" input, reduce the complexity of subsequent processing, and improve the accuracy and efficiency of formatted information generation. At the same time, by leveraging the semantic recognition capabilities of the AI ​​agent, formatted information can be generated accurately and efficiently.

[0072] Optionally, in this embodiment of the application, a possible implementation is provided for generating an initial regular expression corresponding to the original log based on formatting information according to a preset regular expression generation method, specifically by performing the following operations: S202: Encapsulate the capture group expression of each field into a named capture group format, and concatenate the capture group expressions of each field in the order of the fields to obtain the regular expression to be improved.

[0073] In this embodiment, the capture group expressions of the fields are concatenated sequentially according to the field order in the formatting information, and the capture group expression of each field is encapsulated in a named capture group format. The named capture group format is (?P <name>pattern).

[0074] For example, the capture group expression for a field is: (?P <timestamp>\d{4}-\d{2}-\d{2}\d{2}:\d{2}:\d{2}).

[0075] S203: Based on the prefix, suffix, and field separator information of each field, insert characters into the regular expression to be improved to obtain the initial regular expression.

[0076] In this embodiment, the following operations are performed for each field: If a field's generated feature information contains a prefix or suffix, the corresponding prefix or suffix is ​​inserted before or after the field in the regular expression to be improved; then, if a field's generated feature information contains field separator information, the corresponding field separator (e.g., wildcards (e.g., .*?) or explicit separators (e.g., |, -, =)) is inserted after the field in the regular expression to be improved; then, anchor symbols (e.g., "^" and "$") are added to the beginning and end of each line of the regular expression to be improved to limit the matching range and prevent multiple lines of mismatches.

[0077] In this way, the precise conversion from formatted information to regular expressions is achieved. Concatenating fields in order ensures that the regular expressions are consistent with the actual log structure, and the insertion of prefixes, suffixes, delimiters, and anchor symbols ensures the integrity of the regular expressions.

[0078] S21: Use the initial regular expression as the first round of regular expression, perform multiple rounds of generation optimization operations to obtain the final regular expression corresponding to the original log. The first round of generation optimization operations includes: S210 to S212 as follows.

[0079] The loop stops when the current round of validation of the current round of regular expression is passed.

[0080] S210: Use the current round regular expression to extract information from the original log, obtain the current round information extraction result, and match the current round information extraction result with the original log to obtain the current round verification result of the current round regular expression.

[0081] In this embodiment of the application, the current round regular expression is used to extract information from the original log to obtain the current round information extraction result. Then, based on the current round information extraction result and the original log, the current round verification result of the current round regular expression is determined.

[0082] Optionally, in this embodiment of the application, when determining the current round verification result of the current round regular expression based on the current round information extraction result and the original log, the current round row matching rate is determined based on the row information contained in the current round information extraction result and the row information contained in the original log. When the current round row matching rate is greater than the row matching rate threshold, the current round field matching rate is determined based on the field information contained in the current round information extraction result and the field information contained in the original log. When the current round field matching rate is greater than the field matching rate threshold, the current round regular expression verification is determined to be successful.

[0083] The row matching rate threshold can be 75%, and the field matching rate threshold can be 90%, but this embodiment does not impose any restrictions on these. When determining the row matching rate based on the row information in the current round of information extraction results and the row information in the original log, the row information in the current round of information extraction results and the row information in the original log are compared to obtain the number of matched rows. The ratio of the number of matched rows to the total number of rows in the original log is then calculated to obtain the current round row matching rate. Similarly, when determining the field matching rate based on the field information in the current round of information extraction results and the field information in the original log, the consistency of the field information in the current round of information extraction results and the field information in the original log is compared to obtain the number of matched fields. The ratio of the number of matched fields to the total number of fields in the original log is then calculated to obtain the current round field matching rate.

[0084] Optionally, in this embodiment, when determining the current round verification result of the current round regular expression based on the current round information extraction result and the original log, the current round row matching rate can also be determined based on the row information contained in the current round information extraction result and the row information contained in the original log, and the current round field matching rate can be determined based on the field information contained in the current round information extraction result and the field information contained in the original log. Then, the weighted average of the current round row matching rate and the current round field matching rate is calculated to obtain the current round comprehensive matching rate. Finally, it is determined whether the current round comprehensive matching rate is greater than the comprehensive matching rate threshold. If it is, the current round regular expression verification is determined to be passed; otherwise, the current round regular expression verification is determined to be failed. The weight of the field matching rate is greater than the weight of the row matching rate.

[0085] S211: When the current round of validation results in a pass, the current round's regular expression is used as the final regular expression.

[0086] In this embodiment of the application, it is determined whether the current round of verification result is passed. If so, the current round regular expression is used as the final regular expression, and the loop ends.

[0087] S212: When the current round of validation results in failure, collect the current round matching failure field information from the original log, and generate the next round regular expression corresponding to the original log based on the current round matching failure field information, the original log, and the current round regular expression.

[0088] In this embodiment of the application, it is determined whether the current round of verification result is passed. If the current round of verification result is failed, the current round matching failure field information in the original log is collected, and based on the current round matching failure field information, the original log and the current round regular expression, the next round regular expression corresponding to the original log is generated, and then the next round of loop is performed.

[0089] Among them, the field information of the current round of matching failure refers to the fields contained in the original log that do not match the information extraction results of the current round.

[0090] Optionally, in this embodiment of the application, a possible implementation is provided for generating the next round regular expression corresponding to the original log based on the current round's failed matching field information, the original log, and the current round's regular expression. Specifically, the following operations are performed: S2120: Based on the field information of the current round of matching failure, generate an optimized suggestion word, and input the optimized suggestion word, the original log and the current round regular expression into the AI ​​agent to obtain the next round of formatted information corresponding to the original log.

[0091] In this embodiment, based on the field information of the current round of matching failures, an optimization suggestion word is generated. Then, an API interface is called to input the optimization suggestion word, the original log, and the current round's regular expression into the AI ​​agent. The AI ​​agent, based on its internal multiple preset knowledge bases and workflows, performs semantic recognition and field mapping on the input structured log, comprehensively judges the logical hierarchy of the log structure and the relationships between fields, and obtains the next round of formatted information corresponding to the original log output by the AI ​​agent. For example, if the current round of matching failure information includes the first field, the optimization prompt would be: "The first field was incorrectly identified. Please adjust the formatting information. The original log and the current round regular expression are shown below."

[0092] S2121: Based on the formatting information of the next round, generate the next round of regular expression corresponding to the original log according to the regular expression generation method.

[0093] In this embodiment of the application, the step of generating the next round regular expression corresponding to the original log based on the next round of formatting information according to the regular expression generation method is the same as the step of generating the initial regular expression corresponding to the original log based on the formatting information according to the preset regular expression generation method, and will not be repeated here.

[0094] In this way, each round of optimization is based on the results of the previous round, forming a cumulative improvement effect. Through multiple rounds of generation and optimization operations, the regular expression is continuously improved in the iteration and eventually reaches the optimal state, avoiding the problem of errors in a single generation. Furthermore, by generating optimization prompt words through matching failure field information, the AI ​​agent can specifically solve specific problems and optimize the generation of new regular expressions.

[0095] Furthermore, after using the initial regular expression as the first-round regular expression and performing multiple rounds of generation optimization operations to obtain the final regular expression corresponding to the original log, in order to improve the accuracy of subsequent generation and achieve automated, reusable, and high-precision regular expression generation, the knowledge base can also be optimized. The specific steps are as follows: S22: Display the final regular expression to the target object.

[0096] In this embodiment, the final regular expression is displayed to the target object on a visual interface. When the target object is a log analyst, the final regular expression is displayed for their use; when the target object is a regular expression coder, a confirmation notification is sent to the target object to confirm whether the final regular expression is accurate.

[0097] S23: When a confirmation instruction for the final regular expression is received from the target object, a first mapping relationship is constructed between the original log and the final regular expression, and the first mapping relationship is stored in the first knowledge base.

[0098] In this embodiment of the application, when a confirmation instruction for the final regular expression is received from the target object, the original log is associated with the final regular expression to obtain a first mapping relationship, and the first mapping relationship is stored in the first knowledge base.

[0099] Specifically, the receipt of a confirmation instruction from the target object regarding the final regular expression is categorized into different scenarios depending on the target object. Scenario 1: When the target object is a log analyst, if the target object directly uses the final regular expression, a confirmation instruction for the final regular expression is triggered. Scenario 2: When the target object is a regular expression coder, a confirmation instruction for the final regular expression is received upon receiving confirmation feedback from the target object.

[0100] S24: When a modification instruction for the final regular expression of the target object is received, a second mapping relationship is constructed between the original log and the final regular expression modified by the target object, and the second mapping relationship is stored in the first knowledge base.

[0101] In this embodiment of the application, when a modification instruction for the final regular expression of the target object is received, the original log is associated with the final regular expression modified by the target object to obtain a second mapping relationship, and the second mapping relationship is stored in the first knowledge base.

[0102] The modification instruction carries the final regular expression modified by the target object. Upon receiving a modification instruction from the target object regarding the final regular expression, different reception scenarios occur depending on the target object. Scenario 1: When the target object is a log analyst, if the target object modifies and uses the final regular expression, a modification instruction for the final regular expression is triggered. Scenario 2: When the target object is a regular expression coder, a modification instruction for the final regular expression is received upon receiving modification feedback from the target object.

[0103] In this way, a virtuous cycle of continuous system optimization is established. By storing successful mapping relationships, the system continuously enriches its parsing experience, and as the number of uses increases, the system's ability to parse different types of logs is continuously enhanced.

[0104] Based on the above embodiments, see Figure 3 The diagram shown illustrates another implementation flow of a regular expression generation method according to an embodiment of this application, including: S30: Obtain the raw log and preprocess it to obtain the preprocessed log.

[0105] S31: Placeholders are used to mark the key fields in the preprocessed log, and the context information of each field in the preprocessed log is retained to obtain a structured log.

[0106] S32: Input the structured log and preset prompts into the AI ​​agent, perform semantic recognition and structural analysis on the structured log, and obtain formatted information.

[0107] S33: Generate the initial regular expression corresponding to the original log based on the formatted information, according to the preset regular expression generation method.

[0108] S34: Use the initial regular expression as the first round of regular expression, perform multiple rounds of generation and optimization operations to obtain the final regular expression corresponding to the original log.

[0109] One round of generation optimization operations includes steps S340 to S346, as follows: S340: Use the current round's regular expression to extract information from the original log and obtain the current round's information extraction results.

[0110] S341: Match the current round information extraction result with the original log to obtain the current round validation result of the current round regular expression.

[0111] S342: Determine whether the current round of verification results is passed. If yes, execute S343; otherwise, execute S344.

[0112] S343: Use the current regular expression as the final regular expression.

[0113] S344: Collect information on fields that failed to match in the current round from the raw log.

[0114] S345: Based on the field information of the current round of matching failure, generate an optimized suggestion word, and input the optimized suggestion word, the original log and the current round regular expression into the AI ​​agent to obtain the next round of formatted information corresponding to the original log.

[0115] S346: Based on the formatting information of the next round, generate the next round of regular expression corresponding to the original log according to the regular expression generation method.

[0116] S35: Display the final regular expression to the target object.

[0117] S36: When a confirmation instruction for the final regular expression is received from the target object, a first mapping relationship is constructed between the original log and the final regular expression, and the first mapping relationship is stored in the first knowledge base.

[0118] S37: When a modification instruction for the final regular expression of the target object is received, a second mapping relationship is constructed between the original log and the final regular expression modified by the target object, and the second mapping relationship is stored in the first knowledge base.

[0119] In this way, by leveraging the semantic recognition and structural reasoning capabilities of AI agents, the entire process from raw log collection to log parsing and regular expression generation can be automated. Combined with a closed-loop verification mechanism, regular expressions are automatically optimized, effectively reducing labor costs, improving the quality of regular expressions, and efficiently and automatically completing the generation of regular expressions.

[0120] Furthermore, based on the same technical concept, embodiments of this application provide a regular expression generation apparatus, which is used to implement the above-described method flow of embodiments of this application. For example, see [link to relevant documentation]. Figure 4 As shown, the regular expression generation device 400 may include: an initial generation module 401, an optimized generation module 402, and a feedback module 403.

[0121] The initial generation module 401 is used to generate formatted information corresponding to the original log and generate an initial regular expression corresponding to the original log based on the formatted information according to a preset regular expression generation method. The formatted information includes: the generated feature information of each field, and the generated feature information includes: capture group expression. The optimization generation module 402 is used to take the initial regular expression as the first-round regular expression and perform multiple rounds of generation optimization operations to obtain the final regular expression corresponding to the original log. One round of generation optimization operations includes: The current round regular expression is used to extract information from the original log, and the current round information extraction result is obtained. The current round information extraction result is then matched with the original log to obtain the current round validation result of the current round regular expression. If the current round of validation results in a pass, the current round's regular expression will be used as the final regular expression. When the current round of validation fails, collect the information of the fields that failed to match in the current round from the original log, and generate the regular expression for the next round corresponding to the original log based on the information of the fields that failed to match in the current round, the original log, and the regular expression of the current round.

[0122] In an optional embodiment, when generating the next round regular expression corresponding to the original log based on the current round's failed matching field information, the original log, and the current round's regular expression, the optimization generation module 402 is further configured to: Based on the field information of the current round of matching failure, an optimized suggestion word is generated, and the optimized suggestion word, the original log and the current round regular expression are input into the AI ​​agent to obtain the next round of formatted information corresponding to the original log. Based on the formatting information of the next round, generate the next round of regular expressions corresponding to the original logs according to the regular expression generation method.

[0123] In an optional embodiment, when generating the formatted information corresponding to the original log, the initial generation module 401 is further configured to: The raw logs are processed to obtain structured logs; The structured logs and preset prompts are input into the AI ​​agent, which performs semantic recognition and structural analysis on the structured logs to obtain formatted information.

[0124] In one alternative embodiment, the AI ​​agent has a pre-defined knowledge base built in; The preset knowledge base includes: a first knowledge base and a second knowledge base. The first knowledge base is used to store the mapping relationship between historical log samples and regular expression samples, and the second knowledge base is used to store the capture group expressions and semantic mapping rules of each standard field.

[0125] In an optional embodiment, when the original log is processed into a structured log, the initial generation module 401 is further configured to: Obtain the raw logs and preprocess them to obtain the preprocessed logs; Placeholders are used to mark key fields in the preprocessed log, and the context information of each field in the preprocessed log is preserved to obtain a structured log.

[0126] In an optional embodiment, the feature information further includes: field name, field type, field order, semantic label, prefix, suffix, and field separator information.

[0127] In an optional embodiment, when generating the initial regular expression corresponding to the original log based on the formatting information according to a preset regular expression generation method, the initial generation module 401 is further configured to: The capture group expression for each field is encapsulated into a named capture group format, and the capture group expression for each field is concatenated in the order of the fields to obtain the regular expression to be improved. Based on the prefix, suffix, and field separator information of each field, characters are inserted into the regular expression to be improved to obtain the initial regular expression.

[0128] In an optional embodiment, the feedback module 403 is used to: Display the final regular expression to the target object; When a confirmation instruction for the final regular expression is received from the target object, a first mapping relationship is constructed between the original log and the final regular expression, and the first mapping relationship is stored in the first knowledge base; When a modification instruction for the final regular expression of the target object is received, a second mapping relationship is constructed between the original log and the final regular expression modified by the target object, and the second mapping relationship is stored in the first knowledge base.

[0129] Based on the description of the method and apparatus embodiments above, an exemplary embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor. The memory stores a computer program executable by the at least one processor, which, when executed by the at least one processor, causes the electronic device to perform the method according to an embodiment of the present invention.

[0130] This application also provides a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a computer's processor, is used to cause the computer to perform a method according to an embodiment of this application.

[0131] This application also provides a computer program product, including a computer program, wherein the computer program, when executed by a computer's processor, is used to cause the computer to perform a method according to an embodiment of this application.

[0132] See Figure 5 The diagram shown below illustrates the structure of an electronic device 500 that can serve as a server or client in this application, and is an example of a hardware device that can be applied to various aspects of this application. The electronic device is intended to represent various forms of digital electronic computer devices, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the application described and / or claimed herein.

[0133] like Figure 5 As shown, the electronic device 500 includes a computing unit 501, which can perform various appropriate actions and processes based on a computer program stored in a read-only memory (ROM) 502 or a computer program loaded from a storage unit 508 into a random access memory (RAM) 503. The RAM 503 may also store various programs and data required for the operation of the device 500. The computing unit 501, ROM 502, and RAM 503 are interconnected via a bus 504. An input / output (I / O) interface 505 is also connected to the bus 504.

[0134] Multiple components in electronic device 500 are connected to I / O interface 505, including: input unit 506, output unit 507, storage unit 508, and communication unit 509. Input unit 506 can be any type of device capable of inputting information to electronic device 500. Input unit 506 can receive input digital or character information and generate key signal inputs related to user settings and / or function control of electronic device. Output unit 507 can be any type of device capable of presenting information and may include, but is not limited to, a display, speaker, video / audio output terminal, vibrator, and / or printer. Storage unit 508 may include, but is not limited to, disks and optical discs. Communication unit 509 allows electronic device 500 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and / or chipsets, such as Bluetooth devices, WiFi devices, worldwide interoperability for microwave access (WiMax) devices, cellular communication devices, and / or the like.

[0135] The computing unit 501 can be various general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above. For example, in some embodiments, the above-described regular expression generation method can be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program can be loaded and / or installed on the electronic device 500 via ROM 502 and / or communication unit 509. In some embodiments, the computing unit 501 can be configured to perform the above-described regular expression generation method by any other suitable means (e.g., by means of firmware).

[0136] The program code used to implement the methods of this application may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that when executed by the processor or controller, the functions / operations specified in the flowcharts and / or block diagrams are implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0137] In the context of this application, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, RAM, ROM, erasable programmable read-only memory (EPROM) or flash memory, optical fibers, compact disc read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0138] As used in this application, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, device, and / or apparatus (e.g., disk, optical disk, memory, programmable logic device, PLD) used to provide machine instructions and / or data to a programmable processor, including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and / or data to a programmable processor.

[0139] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a cathode ray tube (CRT) or liquid crystal display (LCD) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0140] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or computing systems that include middleware components (e.g., application servers), or computing systems that include frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0141] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other.

[0142] Furthermore, it should be understood that the above-disclosed embodiments are merely preferred embodiments of this application and should not be construed as limiting the scope of the invention. Therefore, any equivalent variations made in accordance with the claims of this invention are still within the scope of this application.< / timestamp> < / name> < / num> < / ip> < / num> < / time> < / ip> < / num> < / time>

Claims

1. A method for generating regular expressions, characterized in that, include: Generate formatted information corresponding to the original log, and generate an initial regular expression corresponding to the original log based on the formatted information according to a preset regular expression generation method. The formatted information includes: generation feature information for each field, and the generation feature information includes: capture group expression. The initial regular expression is used as the first-round regular expression, and multiple rounds of generation and optimization operations are performed to obtain the final regular expression corresponding to the original log. Each round of generation and optimization includes: The original log is extracted using the current round regular expression to obtain the current round information extraction result, and the current round information extraction result is matched with the original log to obtain the current round verification result of the current round regular expression; When the current round of validation results in a pass, the current round regular expression is used as the final regular expression; When the current round of verification results in failure, the current round matching failure field information is collected from the original log, and the next round regular expression corresponding to the original log is generated based on the current round matching failure field information, the original log, and the current round regular expression.

2. The method as described in claim 1, characterized in that, The step of generating the next round regular expression corresponding to the original log based on the current round matching failure field information, the original log, and the current round regular expression includes: Based on the current round matching failure field information, an optimization suggestion word is generated, and the optimization suggestion word, the original log and the current round regular expression are input into the AI ​​agent to obtain the next round formatted information corresponding to the original log. Based on the next round of formatting information, the next round of regular expression is generated according to the regular expression generation method described above, corresponding to the original log.

3. The method as described in claim 1, characterized in that, The formatted information corresponding to the generated original log includes: The original logs are processed into structured logs to obtain structured logs; The structured log and preset prompts are input into the AI ​​agent, which performs semantic recognition and structural analysis on the structured log to obtain the formatted information.

4. The method as described in claim 2 or 3, characterized in that, The AI ​​agent has a built-in pre-defined knowledge base; The preset knowledge base includes: a first knowledge base and a second knowledge base. The first knowledge base is used to store the mapping relationship between historical log samples and regular expression samples, and the second knowledge base is used to store the capture group expressions and semantic mapping rules of each standard field.

5. The method as described in claim 3, characterized in that, The process of structuring the original log to obtain a structured log includes: Obtain the original log and preprocess the original log to obtain the preprocessed log; Placeholders are used to mark the key fields contained in the preprocessed log, and the context information of each field contained in the preprocessed log is retained to obtain the structured log.

6. The method as described in claim 1, characterized in that, The element information also includes: field name, field type, field order, semantic label, prefix, suffix, and field separator information.

7. The method as described in claim 6, characterized in that, The step of generating an initial regular expression corresponding to the original log based on the formatting information according to a preset regular expression generation method includes: The capture group expression of each field is encapsulated into a named capture group format, and the capture group expression of each field is concatenated in the order of the fields to obtain the regular expression to be improved. Based on the prefix, suffix, and field separator information of each field, characters are inserted into the regular expression to be improved to obtain the initial regular expression.

8. The method as described in claim 4, characterized in that, After using the initial regular expression as the first-round regular expression and performing multiple rounds of generation and optimization operations to obtain the final regular expression corresponding to the original log, the process further includes: Display the final regular expression to the target object; When a confirmation instruction from the target object for the final regular expression is received, a first mapping relationship between the original log and the final regular expression is constructed, and the first mapping relationship is stored in the first knowledge base; When a modification instruction for the final regular expression is received from the target object, a second mapping relationship is constructed between the original log and the final regular expression modified by the target object, and the second mapping relationship is stored in the first knowledge base.

9. A regular expression generation device, characterized in that, include: The initial generation module is used to generate formatted information corresponding to the original log, and generate an initial regular expression corresponding to the original log based on the formatted information according to a preset regular expression generation method. The formatted information includes: generation feature information for each field, and the generation feature information includes: capture group expression. The optimization generation module is used to take the initial regular expression as the first-round regular expression and perform multiple rounds of generation optimization operations to obtain the final regular expression corresponding to the original log. One round of generation optimization includes: The original log is extracted using the current round regular expression to obtain the current round information extraction result, and the current round information extraction result is matched with the original log to obtain the current round verification result of the current round regular expression; When the current round of validation results in a pass, the current round regular expression is used as the final regular expression; When the current round of verification results in failure, the current round matching failure field information is collected from the original log, and the next round regular expression corresponding to the original log is generated based on the current round matching failure field information, the original log, and the current round regular expression.

10. An electronic device, comprising: processor; as well as Stored program memory, The program includes instructions that, when executed by the processor, cause the processor to perform the method as described in any one of claims 1-8.