A log analysis method for a railway system, an electronic device, and a storage medium
By performing data cleaning and context enhancement on railway system log data, and combining a large language model with a template library of railway professional knowledge, accurate parsing of railway system log data has been achieved. This solves the problem of insufficient parsing accuracy in existing technologies and improves the accuracy and efficiency of information extraction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA RAILWAY (BEIJING) INFORMATION TECHNOLOGY SERVICES CO LTD
- Filing Date
- 2026-02-08
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies lack sufficient accuracy in parsing log files in railway systems, making it impossible to effectively extract information.
A log parsing method for railway systems is adopted. By acquiring the log data to be parsed from the railway system, data cleaning and context enhancement processing are performed. A large language model is used in combination with railway professional knowledge in the template library to perform event classification, field extraction and semantic parsing to obtain accurate semantic parsing results.
It enables precise analysis of railway system log data, improving the accuracy and efficiency of information extraction.
Smart Images

Figure CN122197857A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of railway information technology, and in particular to a log parsing method, electronic device and storage medium for railway systems. Background Technology
[0002] In today's era, log files need to be parsed to extract information and assess the operational status of relevant systems. Currently, many methods rely on rules to directly parse log files. However, this approach lacks sufficient accuracy in practical railway scenarios. Therefore, a novel log file parsing method is urgently needed to effectively and accurately extract information from railway system infrastructure log files. Summary of the Invention
[0003] The purpose of this application is to provide a log parsing method, electronic device, and storage medium for railway systems to solve the above-mentioned technical problems.
[0004] On the one hand, a log parsing method for railway systems is provided, including:
[0005] Obtain the log data to be parsed from the railway system;
[0006] Based on the log data to be parsed, hierarchical enhancement prompts are determined; the hierarchical enhancement prompts include event classification prompts, field extraction prompts, and semantic parsing prompts.
[0007] Input the hierarchical enhancement prompts and the log data to be parsed into the large language model;
[0008] The large language model calls the event classification template corresponding to the event classification prompt word from the pre-set template library to perform event classification verification. After the verification is passed, it calls the field extraction template corresponding to the field extraction prompt word to extract the corresponding key fields from the log data to be parsed, and calls the semantic parsing template corresponding to the semantic parsing prompt word to perform semantic parsing on the key fields to obtain the corresponding semantic parsing results.
[0009] The semantic parsing results are then transmitted to downstream systems.
[0010] In one embodiment, each template in the template library incorporates corresponding railway professional knowledge, including role knowledge, task instruction knowledge, railway business knowledge, and sample example knowledge. The role knowledge defines the professional identity of the template when it is invoked to perform the corresponding analysis task. The task instruction knowledge defines the output requirements after the template is invoked to perform the corresponding analysis task. The railway business knowledge defines a railway business knowledge graph for retrieval when the template is invoked to perform the corresponding analysis task. The sample example knowledge defines a mapping relationship from input sample logs to normalized output results, so that the template can output corresponding semantic parsing results based on the mapping relationship when it is invoked to perform the corresponding analysis task.
[0011] In one embodiment, the large language model invokes a semantic parsing template corresponding to the semantic parsing prompt word to perform semantic parsing on the key field, including:
[0012] The large language model retrieves relevant knowledge fragments by searching the railway business knowledge graph corresponding to the semantic parsing template based on the key fields.
[0013] The relevant knowledge fragments are transformed into background constraint information that participates in semantic parsing;
[0014] The key fields are semantically parsed based on the background constraint information.
[0015] In one embodiment, obtaining the log data to be parsed from the railway system includes:
[0016] Obtain raw log data from the railway system;
[0017] Perform data cleaning on the raw log data;
[0018] Context enhancement processing is performed on the raw log data after data cleaning to obtain the log data to be parsed.
[0019] In one embodiment, determining event classification prompts based on the log data to be parsed includes:
[0020] Calculate the overall matching score between the log data to be parsed and the preset different candidate event types;
[0021] The preset event classification prompt word corresponding to the candidate event type with the highest comprehensive matching score is determined as the event classification prompt word of the log data.
[0022] In one embodiment, calculating the comprehensive matching score between the log data to be parsed and preset different candidate event types includes:
[0023] For each candidate event type, a multi-dimensional matching score is calculated between the log data to be parsed and the candidate event type. The multi-dimensional matching score includes at least two of the following: keyword matching score, structured field matching score, domain rule constraint matching score, and semantic consistency matching score.
[0024] The comprehensive matching score between the log data to be parsed and the candidate event type is calculated based on the multi-dimensional matching scores and the first weight corresponding to each multi-dimensional matching score.
[0025] In one embodiment, after obtaining the corresponding semantic parsing result, the method further includes:
[0026] Determine the confidence score of the semantic parsing results;
[0027] When the confidence score is greater than or equal to a preset confidence score threshold, the log data and the corresponding semantic parsing results are converted into sample example knowledge and stored.
[0028] In one embodiment, determining the confidence score of the semantic parsing result includes:
[0029] Obtain a multi-dimensional score for the semantic parsing result; the multi-dimensional score includes at least two of the following: the model confidence score of the large language model for the semantic parsing result, the self-consistency score of the semantic parsing result, the rule conformity score, and the keyword matching score for railway business knowledge;
[0030] The confidence score of the semantic parsing result is determined based on each of the multi-dimensional scores and the second weight corresponding to each of the multi-dimensional scores.
[0031] On one hand, an electronic device is provided, including a processor and a memory, wherein the memory stores a computer program, and the processor executes the computer program to implement any of the methods described above.
[0032] On one hand, a computer-readable storage medium is provided, characterized in that the computer-readable storage medium stores a computer program, which, when executed by at least one processor, implements any of the methods described above.
[0033] The log parsing method, electronic device, and storage medium provided in this application for railway systems determine hierarchical enhancement prompts based on the acquired log data to be parsed from the railway system. The hierarchical enhancement prompts and the log data to be parsed are input into a large language model. The large language model calls the event classification template corresponding to the event classification prompts from a pre-set template library to perform event classification verification. After successful verification, it calls the field extraction template corresponding to the field extraction prompts to extract the corresponding key fields from the log data to be parsed, and calls the semantic parsing template corresponding to the semantic parsing prompts to perform semantic parsing on the key fields, obtaining the corresponding semantic parsing results. The semantic parsing results are then transmitted to downstream systems, achieving accurate parsing of railway system log data.
[0034] Other features and advantages of this application will be set forth in the following description and will be apparent in part from the description or may be learned by practicing the application. The purposes and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings. It should be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not intended to limit this disclosure. Attached Figure Description
[0035] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of this application. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0036] Figure 1 A flowchart illustrating the log parsing method for railway systems provided in this application embodiment;
[0037] Figure 2 This is a schematic diagram of the process for obtaining log data to be parsed from a railway system, provided in an embodiment of this application.
[0038] Figure 3 A flowchart illustrating the process of determining event classification prompts based on log data to be parsed, provided in an embodiment of this application;
[0039] Figure 4 A schematic diagram illustrating the process of semantic parsing of key fields using a large language model provided in this application embodiment;
[0040] Figure 5 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application;
[0041] Figure 6 This is a diagram illustrating the overall architecture of the log parsing system provided in this application embodiment.
[0042] Figure 7 This is a schematic diagram of a layered enhancement prompt word framework provided in an embodiment of this application. Detailed Implementation
[0043] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0044] This application provides a log parsing method for railway systems. Please refer to [link to relevant documentation]. Figure 1 As shown, it includes:
[0045] S11: Obtain the log data to be parsed from the railway system;
[0046] S12: Determine the hierarchical enhancement prompts based on the log data to be parsed; the hierarchical enhancement prompts include event classification prompts, field extraction prompts, and semantic parsing prompts.
[0047] S13: Input the hierarchical enhancement prompts and log data to be parsed into the large language model.
[0048] S14: The large language model calls the event classification template corresponding to the event classification prompt from the pre-set template library to perform event classification verification. After the verification is passed, it calls the field extraction template corresponding to the field extraction prompt to extract the corresponding key fields from the log data to be parsed, and calls the semantic parsing template corresponding to the semantic parsing prompt to perform semantic parsing on the key fields to obtain the corresponding semantic parsing results.
[0049] S15: Transmit the semantic parsing results to the downstream system.
[0050] The following is a detailed explanation of each of the above steps.
[0051] The log data to be parsed in this embodiment can come from different business systems of the railway system, including but not limited to ticketing systems, passenger service systems, and transportation scheduling systems.
[0052] In one embodiment, please refer to Figure 2 As shown, step S11 may include the following sub-steps:
[0053] S111: Obtain raw log data from the railway system.
[0054] S112: Perform data cleaning on the raw log data.
[0055] S113: Perform context enhancement processing on the raw log data after data cleaning to obtain the log data to be parsed.
[0056] In step S111, data can be directly collected from the original log files of the railway system's infrastructure to obtain the original log data.
[0057] In step S112, character encoding correction and removal of invalid or redundant log lines can be performed on the original log data. For example, debugging data and duplicate log data can be removed. Each log line can be organized into a regular text unit. For cross-system heterogeneity, such as different formats of signaling, communication, scheduling, and ticketing equipment, format recognition and normalization processing can be performed. Rules and lightweight models are used to identify binary dumps and convert them into hexadecimal strings. For logs with mixed Chinese and English, field alignment can be performed.
[0058] In step S113, context enhancement can be performed as follows: The source of the cleaned raw log data is automatically identified, such as the dispatching central system, the train automatic protection system, the ticket database, the signaling system, etc., and relevant context information is appended to each cleaned raw log data entry. This information may include:
[0059] System metadata: Source information of the corresponding raw log data, including but not limited to log source system identifier, device ID, IP address, timestamp, log level, source identifier, etc.
[0060] Domain knowledge fragments: Based on the keywords in the log content (such as device ID, error code), the system retrieves the most relevant knowledge fragments (terminology explanations, normal status descriptions, common error code explanations, etc.) from the pre-built railway knowledge base in real time. The retrieved relevant knowledge fragments are sorted and merged, and the merged knowledge is formatted into narrative text that is easy to understand using LLM (Large Language Model).
[0061] In this embodiment, a layered enhancement prompt module can determine layered enhancement prompts based on the log data to be parsed. This module maintains a layered, dynamically assemblable template library related to log data from the railway system. This template library is divided according to task granularity, with each layer providing in-depth parsing.
[0062] Event classification prompts: These are used to determine the event type of the log data to be parsed in the railway system. For example: "Please determine the event type described in the following log lines. Event type options include: user authentication, inventory deduction, payment processing, database error, network communication, and system heartbeat."
[0063] Field extraction prompts: Dedicated prompts are designed for specific event types parsed from the log data to be parsed in the railway system to accurately extract key fields. For example: "Please extract the specified fields from the following logs regarding 'Inventory Deduction Anomaly,' strictly following the JSON format. Fields include: timestamp, log_level, service_module, order_id, error_reason, device_id."
[0064] Semantic parsing prompts: used to summarize, attribution analysis, or correlation judgment of the log data to be parsed in the railway system. For example: "Analyze the following logs to summarize the root cause of the failure, the scope of business that may be affected (such as 'ticketing', 'dispatch') and the severity level."
[0065] In this embodiment, each template in the template library incorporates corresponding railway professional knowledge, including role knowledge, task instruction knowledge, railway business knowledge, and sample example knowledge. These different levels of knowledge constitute the core of the dynamic prompt word template, guiding the large language model to parse the log data to be parsed from the railway system.
[0066] It should be noted that the role knowledge definition includes the professional identity of the template when it is called to execute the corresponding analysis task; the task instruction knowledge definition includes the output requirements after the template is called to execute the corresponding analysis task; the railway business knowledge definition includes a railway business knowledge graph, which can be searched when the template is called to execute the corresponding analysis task; and the sample example knowledge definition includes the mapping relationship from the input sample log to the normalized output result, so that the template can output the corresponding semantic parsing result according to the mapping relationship when it is called to execute the corresponding analysis task.
[0067] The following section will further introduce the specific content and implementation methods of the above knowledge.
[0068] A role knowledge segment can be understood as a role definition layer, which sets a clear professional identity for parsing railway system log data for the large language model. Multiple role containers can process different log data segments in parallel, enabling them to parse log data in real time while possessing the perspective and mindset of railway system log data parsing experts, thereby improving the ability to understand the professional terminology and business logic of railway system log data.
[0069] Regarding role knowledge, the following mechanisms can be pre-designed:
[0070] Role Definition: The professional identity description for parsing railway system log data needs to be specific, and a blacklist within the railway field can be built in. For example, "Railway Passenger Ticketing System Expert" is more accurate than "Railway System Expert" because the passenger ticketing system involves specific sub-domains such as user authentication, ticket inventory, and payment.
[0071] Role and Responsibilities Emphasis: A description of the professional responsibilities for analyzing railway system log data can be attached, such as "Responsible for analyzing passenger ticket transaction logs and identifying abnormal events".
[0072] Role domain boundaries: Limit the scope of parsing for railway system log data sources. For example, when processing passenger ticket system logs, ignore equipment monitoring logs, signal equipment logs, etc. This can solve the problem that traditional solutions cannot automatically filter non-ticketing logs (such as server CPU alarms), which leads to parsing errors.
[0073] Built-in railway safety regulations: prohibit parsing log lines containing ID card numbers, automatically block unauthorized access logs (IPs not on the whitelist), etc.
[0074] Fault tolerance mechanism: Design safety mechanisms and fault isolation to allow each role to run in an independent sandbox to avoid the spread of single point of failure. When a role fails continuously for more than a threshold (e.g., 5 times / second), it will automatically switch to a backup role.
[0075] Fault recovery: Real-time monitoring dashboard. When the failure rate of a certain role exceeds the preset failure rate threshold, the faulty role is automatically isolated, the latest rules are loaded from the knowledge layer for retry, and the failure logs are replayed.
[0076] Task instruction knowledge:
[0077] A task instruction can be understood as a task instruction layer, which atomically adapts to multiple log formats to accommodate log throughput of tens of thousands per second. It clearly and structurally defines the railway system log data parsing task, clarifies the output requirements of the large language model with structured instructions, and incorporates railway business conflict rules to ensure that the parsing results of railway system log data conform to preset specifications.
[0078] For knowledge of task instructions, the following mechanisms can be pre-designed:
[0079] Design atomic operation streaming compiler: dynamically switch instruction sets for different subsystems, compile instructions into parallel execution units to adapt to tens of thousands of log throughput per second and cross-system, multi-domain data integration and correlation.
[0080] Design a structured prompting framework based on event chains and causal reasoning: In the semantic parsing prompt words, an "event chain analysis" task is introduced, allowing the system to combine multiple relevant logs from different systems into a "micro-batch" input to a large language model within a short period. At the few-sample instance layer, inference examples of cross-system failures can be provided.
[0081] Output format constraints: Specify a JSON structured format and define required fields based on the selected prompt word template library (e.g., event_type, failure_code, etc. in event category prompt words). For example, "Logs may contain mixed Chinese and English content or binary escape characters; please pay special attention to structured patterns such as [INFO], IP=..., ErrorCode=...." The output structure can include confidence_score and key_evidence fields. For example, append the instruction to the end of the field extraction prompt words: "Also, please evaluate the confidence level of this parsing (0.0-1.0) and cite 1-2 key evidence fragments from the original log that best support the judgment."
[0082] Classification standardization: Provides a limited list of options for parsing railway system log data (e.g., event types can only be selected from predefined event classification prompts).
[0083] Processing Rules: Clearly defined anomaly handling logic is implemented during the parsing of railway system log data. For example, "if a log cannot be categorized, the event_type field returns unknown, and the reason is noted in the details." Abnormal logs are marked and submitted for manual review. If a log simultaneously meets multiple candidate event types under various conditions, the system does not subjectively determine the log's event type using a large model. Instead, it independently evaluates the log against each candidate event type using an event type matching algorithm. For each candidate event type, matching is performed based on four dimensions: keyword rules, structured field features, domain knowledge constraints, and the large model's semantic consistency score. The steps can be as follows:
[0084] For any log entry L and candidate event type Ei, the system performs the following checks in sequence:
[0085] Keyword and pattern matching judgment: Determine whether the log contains the key terms, regular expression patterns or error code prefixes corresponding to the candidate event type Ei, and calculate the keyword matching score accordingly;
[0086] Structured field consistency assessment: Determine whether the log contains fields that are typically required for Ei events (such as whether user authentication events contain fields like user_id, login, and auth), and calculate the structured field matching score accordingly;
[0087] Domain rule constraint judgment: Based on the railway business rule base, determine whether the log meets the business preconditions or time sequence relationship of the event type, and calculate the domain rule constraint matching score accordingly;
[0088] Semantic consistency assessment: Input the log text and event type description into the large language model and calculate their semantic consistency matching score.
[0089] The system calculates a comprehensive matching score Score(Ei) for each event type Ei, and selects the candidate event type with the highest Score(Ei) as the final classification result.
[0090] The matching score (Ei) is composed of the following weighted factors:
[0091] Score(Ei) = w1 · S_rule + w2 · S_field + w3 · S_knowledge + w4 · S_semantic, where S_rule: keyword / regular expression rule matching score (whether the key pattern is hit, representing the keyword matching score), S_field: structured field completeness score (whether the key fields are complete, representing the structured field matching score), S_knowledge: whether the event constraint rules in the railway business knowledge base are satisfied (representing the domain rule constraint matching score), and S_semantic: semantic consistency score of the large model output (0~1, representing the semantic consistency matching score). Each weight wi can be dynamically adjusted according to the system running status or log source.
[0092] Fault tolerance mechanism: Retry instructions (maximum 3 times), atomically group instructions, split log parsing into atomic operation groups, and support partial success.
[0093] Fault recovery: Build a checkpoint recovery system that saves a checkpoint every 1,000 log data processed, locates the last valid checkpoint (when the system load is high, it can be automatically adjusted to more frequent checkpoints (such as every 100 logs) to ensure fast recovery and reduce replay time). When a fault occurs, recovery starts from this checkpoint, unacknowledged logs are replayed, and then the state compensation mechanism is entered.
[0094] Security rules: Enforce mandatory constraints on security-critical fields. For example, in authentication rules, if logs contain keywords such as 'login' or 'auth', the user_id field must be verified, and anonymous accounts are marked as 'SECURITY_ALERT'. Categorize and enforce different scenarios; for example, if a log level is ERROR and contains 'password', 'token', or 'credential', it must be categorized as "data breach".
[0095] Railway business knowledge:
[0096] The railway business knowledge in this application embodiment can be deeply integrated with railway business to solve complex business scenarios. A dynamic and searchable domain knowledge deep injection mechanism is constructed to perform domain knowledge fusion and rule base sharing. A railway business conflict rule engine, a fault root cause rapid location engine, and a time-sensitive rule engine are designed to supplement the knowledge blind spots of the large language model in the railway business domain. The accuracy of parsing is improved through key terminology explanations and business logic descriptions, providing a basis for decision-making.
[0097] For railway business knowledge, the following mechanisms can be designed in advance:
[0098] Construct a refined railway business knowledge graph, which could include a railway safety knowledge graph or an entity-relationship graph. The entity-relationship graph serves as the "constraint and reasoning foundation" for log parsing, event association, and response recommendation. Its three specific uses are: assisting in event classification and ambiguity resolution, supporting cross-system log association and root cause localization, and driving automatic recommendation of response measures.
[0099] "Entity" includes, but is not limited to, at least one of CTC (Centralized Dispatch System), TCC (Train Control Center), GYK (Railway Vehicle Operation Control Equipment), ERR_DB_Conn_Timeout, first class seat, and waiting queue.
[0100] "Relationship" reflects the connection relationship between corresponding entities, such as CTC --generates-->(log type: operation log), TCC --monitors-->(line segment: XY), ERR_DB_Conn_Timeout --requires-->(remediation: restart database connection service).
[0101] To facilitate understanding, a specific example will be used for illustration here.
[0102] Log input: The system received the following log (example):
[0103] [2024-xx-xx xx:xx:xx] CTC module error: ERR_DB_Conn_Timeout during timetable update.
[0104] Entity recognition and alignment (Purpose of the graph 1: Assisting event classification and ambiguity resolution, preventing large models from misclassifying the error as a network communication or business logic error):
[0105] The system first identifies entities from the logs:
[0106] System Entity: CTC;
[0107] Error entity: ERR_DB_Conn_Timeout;
[0108] Then align them in the entity-relationship graph:
[0109] Confirmation: CTC ∈ Centralized Scheduling System;
[0110] Confirmation: ERR_DB_Conn_Timeout ∈ Database connection exception class error;
[0111] Using relational constraints to perform semantic constraints on events (key point);
[0112] Based on the graph relationship: ERR_DB_Conn_Timeout — requires — restart the database connection service;
[0113] Constraints are applied to the output of large models: if the suggested action by the model is not the same as "restarting the database connection service", the credibility of the suggestion is reduced or a secondary verification is triggered.
[0114] Cross-system correlation reasoning (Purpose of the graph 2: Support cross-system log correlation and root cause localization, inferring that current database connection anomalies may affect the train control system's scheduling decisions for segment X–Y):
[0115] The system continues to search for entities in the graph associated with CTC:
[0116] CTC generates operation logs;
[0117] TCC — monitors — line segment X–Y;
[0118] The system detected that within the same time window, TCC generated monitoring anomaly logs in the X–Y segment, indicating a scheduling dependency between the two.
[0119] Recommended treatment measures (Use of the map 3: Driving automatic recommendation of treatment measures):
[0120] Based on the handling relationship in the graph: ERR_DB_Conn_Timeout → Restart the database connection service, the system generates the following handling suggestions: i. Prioritize restarting the database connection service; ii. Verify the recovery status of the CTC operation log; iii. Verify whether the monitoring status of the TCC for the X–Y segment has been synchronously restored.
[0121] In this embodiment, the most relevant knowledge fragments retrieved and associated with the current log can be transformed into background constraint information for semantic parsing. Semantic parsing of key fields is then performed based on this background constraint information. The relevant knowledge fragments originate from the entity-relationship graph of the railway domain, providing information on system entities, error types, business objects, and their relationships related to the current log. After completing knowledge retrieval, the system further performs domain knowledge fusion processing to ensure that the knowledge fragments can effectively participate in the model reasoning process.
[0122] The domain knowledge fusion includes:
[0123] Terminology-level fusion: Semantic alignment and concise definition of retrieved railway industry professional terms, system abbreviations, equipment codes and procedure numbers are performed to eliminate ambiguity and unify model understanding. For example, "inventory deduction" is clarified as "the business operation of reducing ticket inventory after a user purchases a ticket".
[0124] Process-level integration: Introduce the interaction relationship between railway business processes and systems, and provide the model with the order of key business calls as background constraint information, such as "the ticketing system sends the ticket checking result to the parsing engine, and the parsing engine sends the timetable update instruction (XML) to the dispatching center after processing".
[0125] Rule-level fusion: Combining the handling rules and business constraints in domain knowledge, the model reasoning results are guided and restricted to avoid generating analytical conclusions or handling suggestions that do not conform to the railway business logic.
[0126] By using the aforementioned domain knowledge fusion method, the system will transform static knowledge retrieved from the knowledge graph into background constraint information that can be used for reasoning, thereby improving the interpretability and engineering feasibility of the parsing results while ensuring the accuracy of log parsing.
[0127] Railway business rules: describe the logical relationships between events (e.g., "Inventory deduction is triggered only after payment processing is successful," "Waiting list business rules: Waiting list inventory is independent of regular inventory, and after release, it must be prioritized for allocation to the waiting list queue"). The rule base is shared, and the system can identify cross-domain related events (e.g., "Ticket payment failure" + "Train delay" triggers a joint compensation rule).
[0128] Railway Business Conflict Rule Engine: This engine deeply analyzes railway business conflict rules to specifically address the unique multi-business cross-conflict issues (including business security conflicts) inherent in the railway system. For example, in scenarios with abnormal business operation time sequences, the rule is "Payment completion time > Inventory lock timeout time → Force marking "ERR_TIMEOUT"". In scenarios with inconsistent states across multiple systems, the rule is "Payment status = Success" && Inventory status = Not deducted → Trigger inventory compensation mechanism".
[0129] Fault root cause localization engine: Based on the railway error code knowledge base, it can quickly locate faults (such as "ERR_DB_CONN=database connection failed" and "ERR_STOCK_008 → [root cause] remaining ticket calculation service not synchronized with section inventory").
[0130] Time-Sensitive Rules Engine: In specific scenarios, a "rollback insurance" is set after a critical operation succeeds to prevent data inconsistency issues caused by subsequent operation failures. For example, in distributed transaction or inventory management scenarios, a 15-minute countdown is started after the inventory deduction operation has been successfully completed.
[0131] Building a self-contained offline knowledge package: Given the offline requirements, a compressed but comprehensive railway operations and maintenance knowledge package must be pre-embedded in the domain knowledge layer and the few-sample example layer. This knowledge package is extracted and refined from historical logs, operations and maintenance manuals, and expert experience in advance, serving as a built-in offline wiki to ensure sufficient context is provided even without a network connection.
[0132] Security threshold knowledge base: Performance security thresholds (CPU utilization, network latency, memory usage), business security thresholds (login failure, transaction amount, inventory changes), and automatic response rules (security level, risk type, violation type). Rule execution priorities are set as follows: Personal safety related rules (signal system anomalies) > Data security rules (PII leakage risk) > Business security rules (abnormal inventory changes) > Performance security rules (system resource alarms). Higher-level rules override lower-level rules, and rules of the same level are executed according to the most recent time.
[0133] Fault Tolerance and Rule Evolution Control Mechanism: The system manages existing basic rules and dynamically generated candidate rules separately within the railway business domain. Basic rules, derived from existing railway business procedures and expert experience, constrain the fundamental logic of log parsing and decision-making, characterized by high stability and long change cycles. For candidate rules or newly added rules formed during domain knowledge fusion and log parsing, the system performs rule health monitoring before formal loading, including consistency checks, conflict detection, and historical verification. When anomalies are detected in candidate rules or they fail to meet preset stability conditions, the system automatically rolls back to the previous stable rule version. Simultaneously, the system employs a multi-copy redundant storage mechanism for the knowledge base and rule base, supporting copy consistency checks and automatic repair to improve the reliability and fault tolerance of rule management and knowledge services.
[0134] Fault recovery: Retains knowledge versions within the most recent preset time period (e.g., 30 days), allowing one-click rollback to any historical version. Automatically captures rule anomalies (e.g., student ticket verification failure), triggers few-sample learning to generate new rules, and deploys them after testing and verification.
[0135] Sample knowledge:
[0136] Sample example knowledge is used to provide carefully designed examples to the large language model during the inference phase of log parsing. These examples demonstrate the mapping relationship from input logs to normalized output results in the edge scenario of railway IT infrastructure. This sample example knowledge may not participate in model training or parameter updates. Instead, it is dynamically introduced into the model as contextual input along with the current log data to be parsed during the practical application phase after model training. This guides the model to understand the input format, output structure, and classification logic of the parsing task. In specific implementation, the system selects similar examples from the example library based on the source, format characteristics, or error type of the current log, and inserts these examples into prompt words in the form of example input-example output pairs. This generates structured parsing results that meet railway business requirements without changing the model parameters.
[0137] For knowledge derived from sample examples, the following mechanisms can be pre-designed:
[0138] Sample representativeness: Covering marginal cases in railway system log data (such as logs containing error information, student ticket discount amount analysis (including document type), and dynamic allocation logs of standby inventory during the Spring Festival travel season).
[0139] Input-output alignment: Clearly shows the mapping relationship between railway system log data and structured output.
[0140] Few-shot learning integrates multi-domain data: The sample layer collects log samples from different domains (such as station gate logs and carriage sensor logs), and improves the model's generalization ability through transfer learning.
[0141] Fault tolerance mechanism: AI is used to automatically detect and repair problematic samples and mark them for manual review. Samples with low confidence are automatically isolated.
[0142] Fault Recovery: After log parsing is complete, the system performs consistency checks on the parsing results. It quantifies the deviation between the parsing results and the reference results by comparing metrics such as event type matching, key field completeness, and structured output consistency. When a parsing deviation exceeds a preset threshold, and removing or replacing a few sample layers significantly improves the parsing results, the system determines that the parsing error is caused by sample mismatch. In response, the system activates a sample repair mechanism, automatically constructing corrected samples covering relevant edge scenarios through the generated model and updating the sample library.
[0143] For example, the sample knowledge corresponding to user authentication may include:
[0144] Input: "2023-11-20 14:30:22 [INFO] Server-101: User 'Alice' authenticated via LDAP."
[0145] Output:
[0146] {
[0147] "timestamp": "2023-11-20T14:30:22",
[0148] "device_id": "Server-101",
[0149] "event_type": "User authentication",
[0150] "severity": "INFO",
[0151] "details": "User 'Alice' authenticated via LDAP."
[0152] }
[0153] For example, sample knowledge corresponding to database errors may include:
[0154] Enter "2023-11-20 14:31:05 [ERROR] Server-102: DB_DEADLOCK detected inticket_orders table."
[0155] Output:
[0156] {
[0157] "timestamp": "2023-11-20T14:31:05",
[0158] "device_id": "Server-102",
[0159] "event_type": "Database error",
[0160] "severity": "ERROR",
[0161] "details": "DB_DEADLOCK detected in ticket_orders table."
[0162] }
[0163] For example, sample knowledge corresponding to edge case-event type conflict may include:
[0164] Input: "2023-11-20 14:32:17 [WARN] Server-101: Payment timeout butinventory already deducted."
[0165] Output:
[0166] {
[0167] "timestamp": "2023-11-20T14:32:17",
[0168] "device_id": "Server-101",
[0169] "event_type": "Payment Processing", / / Rule: Payment process exceptions are still classified as payment processing.
[0170] "severity": "WARN",
[0171] "details": "Payment timeout but inventory already deducted."
[0172] }
[0173] In this embodiment, multiple different candidate event types can be preset, and corresponding preset event classification prompts can be set for each candidate event type. In one embodiment, please refer to... Figure 3 As shown, event classification prompts are determined based on the log data to be parsed, including:
[0174] S121: Calculate the overall matching score between the log data to be parsed and the preset different candidate event types.
[0175] For example, candidate event types may include user authentication, inventory deduction, payment processing, database errors, network communication, and system heartbeats. If the log matches multiple candidate event types, the most directly relevant type can be selected.
[0176] S122: Determine the preset event classification prompt words corresponding to the candidate event type with the highest comprehensive matching score as the event classification prompt words for the log data to be parsed.
[0177] In one embodiment, calculating the comprehensive matching score between the log data to be parsed and preset different candidate event types includes:
[0178] For each candidate event type, calculate the multi-dimensional matching score between the log data to be parsed and the candidate event type. The multi-dimensional matching score includes at least two of the following: keyword matching score, structured field matching score, domain rule constraint matching score, and semantic consistency matching score. Calculate the comprehensive matching score between the log data to be parsed and the candidate event type based on each multi-dimensional matching score and the first weight corresponding to each multi-dimensional matching score.
[0179] The specific calculation method for the matching score of each dimension can be found in the existing calculation method, and will not be repeated here.
[0180] Specifically, the multi-dimensional matching scores and the first weights corresponding to each multi-dimensional matching score can be multiplied together, and the sum of the products can be used as the corresponding comprehensive matching score.
[0181] In one embodiment, a relationship can be established between preset event classification prompts, preset field extraction prompts, and preset semantic parsing prompts. Each preset event classification prompt has a corresponding event classification template, each preset field extraction prompt has a corresponding field extraction template, and each preset semantic parsing prompt has a corresponding semantic parsing template. Thus, after determining the event classification prompts in the above manner, the corresponding field extraction prompts and semantic parsing prompts can be determined based on the relationship, and then the corresponding field extraction templates and semantic parsing templates can be determined.
[0182] For example, the field extraction template corresponding to the field extraction prompt for the inventory deduction event may include:
[0183] Please parse the logs into JSON format, including the following fields:
[0184] - timestamp: Log timestamp (converted to ISO 8601 format)
[0185] - device_id: Device ID (extraction format such as "Server-XXX")
[0186] - event_type: Event type (must be one of the following options: user authentication, inventory deduction, payment processing, database error, network communication)
[0187] - operation_result: Operation result (determine whether the inventory deduction operation was successful or failed from the logs. If the log level is "INFO" and contains keywords such as "success" or "deducted", the result is "success"; if the log level is "ERROR" or contains keywords such as "fail" or "insufficient", the result is "failure".)
[0188] - order_id: Extract the order number (look for the format "OrderID=XXXX", leave it blank if it is not found).
[0189] - inventory_type:
[0190] - quantity: Log level (extracted from raw logs, such as INFO / WARN / ERROR)
[0191] - failure_code: Reason for failure (If the operation fails, extract the reason description from the log, such as "Insufficient stock"; otherwise leave blank)
[0192] - details: Event details (original summary, redundant information removed)
[0193] rule:
[0194] 1. If the railway log device identifier is missing, fill in "UNKNOWN";
[0195] 2. If the timestamp format is incorrect, retain the original value and mark it as "INVALID".
[0196] 3. If the inventory type and quantity are not explicitly mentioned, the inventory type is "UNKNOWN" and the quantity is 0. ……
[0197] In one embodiment, please refer to Figure 4 As shown, the large language model calls the semantic parsing template corresponding to the semantic parsing prompt words to perform semantic parsing on key fields, including:
[0198] S141: The large language model retrieves relevant knowledge fragments by searching the railway business knowledge graph corresponding to the semantic parsing template based on key fields.
[0199] S142: Transform relevant knowledge fragments into background constraint information for semantic parsing; S143: Perform semantic parsing on key fields based on background constraint information.
[0200] In one embodiment, after obtaining the corresponding semantic parsing result, the method may further include:
[0201] Determine the confidence score of the semantic parsing results;
[0202] When the confidence score is greater than or equal to a preset confidence score threshold, the log data and the corresponding semantic parsing results are converted into sample example knowledge and stored.
[0203] In one embodiment, determining the confidence score of the semantic parsing result includes:
[0204] Obtain a multi-dimensional score for the semantic parsing result; the multi-dimensional score includes at least two of the following: the model confidence score of the large language model for the semantic parsing result, the self-consistency score of the semantic parsing result, the rule conformity score, and the keyword matching score for railway business knowledge;
[0205] The confidence score of the semantic parsing result is determined based on each of the multi-dimensional scores and the second weight corresponding to each of the multi-dimensional scores.
[0206] Model confidence score : Obtained from the confidence information or probability score output by the large language model when generating the parsing results, it is used to reflect the model's subjective certainty about the current parsing results.
[0207] Self-consistency score The consistency score is obtained by detecting the internal consistency of the parsing results. The consistency detection includes, but is not limited to, whether there are semantic conflicts or logical contradictions between the event type, the key field extraction results and the generated conclusion. When the parsing results are consistent in multiple inferences or multi-perspective verifications, the output self-consistency score is high.
[0208] Rule compliance score The evaluation is based on the degree of conformity between the analysis results and the basic rules of railway operations. It is determined whether the analysis results violate the system entity responsibility constraints, business process sequence rules, and standard handling measures rules. If the analysis results meet the constraints of the rules, the rule conformity score is high.
[0209] Railway business knowledge keyword matching score The matching score is calculated by checking whether the parsing results reasonably cover railway business professional terms, system abbreviations, equipment codes or error type keywords related to the current log, and combining the importance weight of the keywords.
[0210] The system generates a final confidence score based on the above multiple scoring factors using a weighted fusion method. The confidence score can be expressed as: C_score = • + • + • + •
[0211] in, , , , The weight is a preset or adjustable weight parameter, which is the second weight mentioned above, and satisfies the normalization constraint.
[0212] Through the aforementioned confidence assessment mechanism, the system can quantitatively assess the credibility of log parsing results without introducing complex computational overhead, providing a basis for subsequent manual review, automatic handling decisions, or abnormal re-parsing.
[0213] In this embodiment, the constructed hierarchical enhancement prompts are combined with the railway log data entries to be parsed and sent to the Large Language Model (LLM) API. The prompt selection interacts with the LLM. The system first uses an event classification template; after verification and evaluation, the LLM returns the event type. Then, the system dynamically selects the corresponding field extraction template; after verification and evaluation, the LLM returns the key field results. Finally, the system dynamically selects the corresponding semantic parsing template; after verification and evaluation, the LLM returns the semantic parsing results. When receiving the model's return results, tools such as a JSON parser are used to convert the natural language output into structured data objects. Instructions are strictly formatted using JSON for output, and the parsing results are pushed to downstream systems (such as scheduling and monitoring systems) in real time via a message queue to ensure real-time alarms and responses.
[0214] Example of the output JSON structure for the inventory deduction event type:
[0215] Success stories:
[0216] {
[0217] "event_type": "Inventory deduction",
[0218] "operation_status": "Success",
[0219] "order_id": "TK20231120001",
[0220] "inventory_type": "seat",
[0221] "quantity": 2,
[0222] "amount": 598.0,
[0223] "failure_code": "",
[0224] "retry_attempt": 0
[0225] }
[0226] Failure Case:
[0227] {
[0228] "event_type": "Inventory deduction",
[0229] "operation_status": "Failed",
[0230] "order_id": "UNKNOWN_REF",
[0231] "inventory_type": "sleeper berth",
[0232] "quantity": 1,
[0233] "amount": 0.0,
[0234] "failure_code": "ERR_STOCK_008",
[0235] "retry_attempt": 3
[0236] }
[0237] In this embodiment, an iterative verification chain can also be designed. For results with low confidence scores (e.g., <0.8) or involving safety-critical equipment, the system automatically triggers a second round of "verification prompts." Example verification prompt: "You have just parsed the log [log content] into [preliminary parsing results]. Now, please review it as an auditor: 1. Is the [key field] in the parsing result absolutely consistent with the [related part] in the log? 2. Are there other possible interpretations of this result? Please list them. 3. Based on the review, what is your final confidence level? Please output the final result and confidence level after the review."
[0238] Railway Knowledge Base Update and Feedback Loop: After obtaining the confidence score of the semantic parsing results, the parsing results can be graded. When the confidence score of the parsing result is lower than a preset threshold, and it is confirmed by manual verification or subsequent log verification as correct, the system marks the corresponding log sample and its parsing result as a high-confidence parsing sample. When the parsing result has a deviation but is corrected to obtain a correct parsing result, the system uses the corrected parsing result and the original log together as candidate optimization samples. For the candidate optimization samples, the system transforms them into new sample examples or domain knowledge rules according to their applicable scope and feature type, and dynamically updates them to the prompt word template library. Among them, sample examples are used to supplement the model's parsing capabilities in specific railway IT infrastructure edge scenarios, and domain knowledge rules are used to improve the constraints on log parsing results. At the same time, the system can summarize parsing patterns that occur repeatedly and have stable correction effects to form new domain knowledge rules, and incorporate these rules into the rule base as constraints for subsequent log parsing and confidence evaluation.
[0239] Based on this, the system periodically organizes the newly accumulated labeled log samples for adjusting the prompt word template parameters, or for fine-tuning or incrementally training the large language model execution instructions, thereby achieving continuous self-optimization of the railway IT infrastructure log file parsing system.
[0240] Example: During system operation, the following log types were parsed multiple times:
[0241] [TCC][WARN]Section XY occupancy abnormal, auto fallback triggered
[0242] (In the initial template library, the combination of "abnormal segment occupancy + automatic rollback" is relatively rare, and the initial resolution confidence of LLM is low.)
[0243] 1) Identify high-quality parsing results that can be used as sample examples.
[0244] In the confidence assessment: the log parsing result meets the following criteria: C_score ≥ 0.92, high rule compliance score, no manual error correction record, and the system marks the parsing result as a "high confidence parsing sample".
[0245] 2) Convert the parsing results into "standardized sample pairs":
[0246] The system does not simply store logs, but rather abstracts them into a Prompt example structure:
[0247] [TCC][WARN]Section XY occupancy abnormal, auto fallback triggered
[0248] Input example:
[0249] Event type: Line section monitoring anomaly
[0250] System involved: TCC
[0251] Affected areas: XY
[0252] Action taken: Automatic rollback has been triggered; manual confirmation is recommended.
[0253] 3) Categorize examples and insert them into the prompt word template library.
[0254] Based on the example's characteristics, the system categorizes the example as follows:
[0255] Log source: TCC
[0256] Event Category: Line Monitoring Anomaly
[0257] Scene tags: Auto-back / Edge scene
[0258] And insert it into the corresponding prompt word template entry, for example:
[0259] Template Category: TCC_Line Monitoring
[0260] Sample example set: {Example 1, Example 2, Example 3 (new)}
[0261] 4) Use the updated template library at runtime.
[0262] When similar logs are encountered again in the future:
[0263] [TCC][WARN]Section AB occupancy abnormal, fallback mode enabled
[0264] When constructing a Prompt, the system will prioritize selecting similar sample examples from the template library and dynamically insert similar examples that meet the preset number into the Prompt.
[0265] The log parsing method for railway systems provided in this application is based on a large-model prompt word implementation, offering comprehensive advantages over existing technologies, including strong structure, dynamic adaptation, high-precision parsing, security and reliability, and cross-domain scalability. This method constructs a hierarchical enhanced prompt word engineering framework, dividing prompt words into four layers: role definition, task instructions, domain knowledge, and few-sample examples. Each layer has clearly defined responsibilities and works collaboratively, achieving a systematic and scalable prompt structure. The method possesses dynamic adaptive capabilities, automatically adjusting domain knowledge and example content based on different log types and business scenarios, significantly enhancing the model's understanding of railway-specific terminology, rules, and semantics, achieving millisecond-level high-precision parsing. By introducing fault tolerance and self-healing mechanisms and multi-round verification chains, the system can quickly recover in abnormal or faulty situations, significantly improving security and stability. Simultaneously, this method supports the integration and correlation reasoning of heterogeneous log data across systems and domains, possessing good adaptability and scalability, and can provide efficient and reliable data support for intelligent operation and maintenance, anomaly detection, risk warning, and fault diagnosis of railway IT systems.
[0266] It should be understood that although the steps in the flowchart above are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowchart above may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these sub-steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.
[0267] Based on the same inventive concept, in one embodiment, please refer to Figure 5 As shown, an electronic device is provided, which includes a log parsing system. The log parsing system includes a processor 501 and a memory 502. The memory 502 stores a computer program, and the processor 501 executes the computer program to implement the steps of the method described above. Further details are omitted here. The overall architecture of the log parsing system can be found in [reference needed]. Figure 6 As shown, the hierarchical reinforcement prompt word framework can be as follows: Figure 7 As shown.
[0268] Processor 501 can be an integrated circuit chip with signal processing capabilities. The processor 501 can be a general-purpose processor, including a CPU (Central Processing Unit), NP (Network Processor), etc.; it can also be a DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor.
[0269] The memory 502 may include, but is not limited to, RAM (Random Access Memory), ROM (Read Only Memory), PROM (Programmable Read Only Memory), EPROM (Erasable Programmable Read-Only Memory), and EEPROM (Electrically Erasable Programmable Read Only Memory).
[0270] Those skilled in the art will understand that Figure 5 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the application of the present application. Specific electronic devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0271] Based on the same inventive concept, embodiments of this application also provide a computer-readable storage medium, such as a floppy disk, optical disk, hard disk, flash memory, USB flash drive, SD (Secure Digital) card, MMC (Multi-Media Card), etc. The computer-readable storage medium stores a computer program, which, when executed by at least one processor, implements the steps of the methods in the above embodiments, which will not be repeated here.
[0272] Based on the same inventive concept, embodiments of this application also provide a computer program product, including a computer program that, when executed by a processor, implements any of the methods described above.
[0273] The program code for executing the computer program product of this application can be written in any combination of one or more programming languages. The program code can be executed entirely on the user device, partially on the user device, as a standalone software package, partially on the user device and partially on a remote device, or entirely on a remote device.
[0274] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, etc.) containing computer-usable program code.
[0275] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer-readable storage media according to this application. It should be understood that each block of the flowchart illustrations and / or block diagrams, as well as combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0276] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0277] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of user-operated steps to be executed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1The steps of the function specified in one or more boxes.
[0278] It should be noted that the illustrations provided in this embodiment are only schematic representations of the basic concept of this application. Therefore, the drawings only show components relevant to this application and are not drawn according to the actual number, shape, and size of components in implementation. In actual implementation, the shape, quantity, and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex. The structures, proportions, sizes, etc., shown in the accompanying drawings are only used to complement the content disclosed in the specification for those skilled in the art to understand and read, and are not intended to limit the implementation conditions of this application. Therefore, they have no substantial technical significance. Any modification to the structure, change in the proportional relationship, or adjustment of the size, without affecting the effect and purpose that this application can produce, should still fall within the scope of the technical content disclosed in this application. At the same time, the terms such as "upper," "lower," "left," "right," "middle," and "one" used in this specification are only for clarity of description and are not intended to limit the scope of implementation of this application. Changes or adjustments in their relative relationships, without substantially changing the technical content, should also be considered within the scope of implementation of this application.
[0279] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the document does not necessarily refer to the same embodiment, nor is it a mutually exclusive, independent, or alternative embodiment. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0280] As illustrated herein, unless the context clearly indicates otherwise, the words “a,” “an,” “an,” and / or “the” do not specifically refer to the singular and may also include the plural. Generally speaking, the terms “comprising” and “including” only indicate the inclusion of explicitly identified steps and elements, which do not constitute an exclusive list, and the method or apparatus may also include other steps or elements.
[0281] The definitions used herein, such as the terms “having,” “may have,” “comprising,” or “may include,” indicate the presence of the corresponding function, operation, element, etc., and do not limit the presence of one or more other functions, operations, elements, etc. Furthermore, it should be understood that the terms “comprising” or “having” as used herein indicate the presence of the features, figures, steps, operations, elements, components, or combinations thereof described in the specification, without excluding the presence or addition of one or more other features, figures, steps, operations, elements, components, or combinations thereof.
[0282] The prefixes such as "first" and "second" used in this application embodiment are merely for distinguishing different descriptive objects and do not limit the position, order, priority parameters, quantity, or content of the described objects. The use of ordinal numbers and other prefixes used to distinguish descriptive objects in this application embodiment does not constitute a limitation on the described objects. For statements regarding the described objects, please refer to the claims or the context of the embodiments. The use of such prefixes should not constitute unnecessary limitations. Furthermore, in the description of this embodiment, unless otherwise stated, "multiple" means two or more.
[0283] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0284] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.
Claims
1. A log parsing method for railway systems, characterized in that, include: Obtain the log data to be parsed from the railway system; Based on the log data to be parsed, determine the hierarchical reinforcement prompt words; The layered enhancement prompts include event classification prompts, field extraction prompts, and semantic parsing prompts; Input the hierarchical enhancement prompts and the log data to be parsed into the large language model; The large language model calls the event classification template corresponding to the event classification prompt word from the pre-set template library to perform event classification verification. After the verification is passed, it calls the field extraction template corresponding to the field extraction prompt word to extract the corresponding key fields from the log data to be parsed, and calls the semantic parsing template corresponding to the semantic parsing prompt word to perform semantic parsing on the key fields to obtain the corresponding semantic parsing results. The semantic parsing results are transmitted to the downstream system.
2. The log parsing method for railway systems according to claim 1, characterized in that, Each template in the template library incorporates corresponding railway professional knowledge, which includes role knowledge, task instruction knowledge, railway business knowledge, and sample example knowledge. The role knowledge definition includes the professional identity of the person whose template is invoked to perform the corresponding analysis task; The task instruction knowledge definition defines the output requirements after the template is called to execute the corresponding analysis task; the railway business knowledge definition defines a railway business knowledge graph, which is used for retrieval when the template is called to execute the corresponding analysis task; the sample example knowledge definition defines a mapping relationship from input sample logs to normalized output results, which is used for outputting corresponding semantic parsing results based on the mapping relationship when the template is called to execute the corresponding analysis task.
3. The log parsing method for railway systems according to claim 2, characterized in that, The large language model calls the semantic parsing template corresponding to the semantic parsing prompt word to perform semantic parsing on the key field, including: The large language model retrieves relevant knowledge fragments by searching the railway business knowledge graph corresponding to the semantic parsing template based on the key fields. The relevant knowledge fragments are transformed into background constraint information that participates in semantic parsing; The key fields are semantically parsed based on the background constraint information.
4. The log parsing method for railway systems according to claim 1, characterized in that, Obtain the log data to be parsed from the railway system, including: Obtain raw log data from the railway system; Perform data cleaning on the raw log data; Context enhancement processing is performed on the raw log data after data cleaning to obtain the log data to be parsed.
5. The log parsing method for railway systems according to claim 1, characterized in that, Based on the log data to be parsed, event classification prompts are determined, including: Calculate the overall matching score between the log data to be parsed and the preset different candidate event types; The preset event classification prompt words corresponding to the candidate event type with the highest comprehensive matching score are determined as the event classification prompt words of the log data to be parsed.
6. The log parsing method for railway systems according to claim 5, characterized in that, Calculate the comprehensive matching score between the log data to be parsed and the preset different candidate event types, including: For each candidate event type, a multi-dimensional matching score is calculated between the log data to be parsed and the candidate event type. The multi-dimensional matching score includes at least two of the following: keyword matching score, structured field matching score, domain rule constraint matching score, and semantic consistency matching score. The comprehensive matching score between the log data to be parsed and the candidate event type is calculated based on the multi-dimensional matching scores and the first weight corresponding to each multi-dimensional matching score.
7. The log parsing method for railway systems according to claim 1, characterized in that, After obtaining the corresponding semantic parsing result, the method further includes: Determine the confidence score of the semantic parsing results; When the confidence score is greater than or equal to a preset confidence score threshold, the log data and the corresponding semantic parsing results are converted into sample example knowledge and stored.
8. The log parsing method for railway systems according to claim 7, characterized in that, Determining the confidence score of the semantic parsing result includes: Obtain a multi-dimensional score for the semantic parsing result; the multi-dimensional score includes at least two of the following: the model confidence score of the large language model for the semantic parsing result, the self-consistency score of the semantic parsing result, the rule conformity score, and the keyword matching score for railway business knowledge; The confidence score of the semantic parsing result is determined based on each of the multi-dimensional scores and the second weight corresponding to each of the multi-dimensional scores.
9. An electronic device, characterized in that, It includes a processor and a memory, wherein the memory stores a computer program, and the processor executes the computer program to implement the method as described in any one of claims 1-8.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by at least one processor, implements the method as described in any one of claims 1-8.