Log anomaly detection method, device, medium and product based on large model

By parsing and populating structured field data of prompt word templates in the database auditing system, semantic analysis is performed using a large model to generate anomaly detection reports. This solves the rigidity and lack of adaptability in existing database operation anomaly detection technologies, and achieves accurate detection of database behavior.

CN122241513APending Publication Date: 2026-06-19BEIJING YOUTEJIE INFORMATION TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING YOUTEJIE INFORMATION TECH
Filing Date
2026-03-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing database operation anomaly detection technologies cannot dynamically understand the true operational intentions of operation and maintenance personnel. The detection mechanism is rigid, lacks adaptability, and the alarm information lacks interpretability.

Method used

By acquiring the target query logs of the database audit system, parsing the structured field data and populating it into a pre-built prompt word template, performing semantic analysis using a large model, generating semantic vectors, and combining them with the target user's dynamic baseline to generate an anomaly detection report.

Benefits of technology

It enables dynamic understanding of the operational intentions of maintenance personnel at the semantic level, improves the flexibility and adaptability of anomaly detection, and can accurately detect complex and hidden internal data security threats.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241513A_ABST
    Figure CN122241513A_ABST
Patent Text Reader

Abstract

This invention discloses a log anomaly detection method, device, medium, and product based on a large model. The method includes: obtaining target query logs from a database auditing system; parsing structured field data matching the target database behavior of the target query user from the target query logs; filling the structured field data into a pre-built prompt word template; inputting the obtained contextual semantic prompt words of the target query logs into a large model for semantic analysis processing; generating a matching semantic vector after obtaining structured analysis text matching the target query logs; obtaining a target user dynamic baseline matching the target query user; and generating an anomaly detection report matching the target query logs based on the structured field data, the semantic vector, and the target user dynamic baseline. This solution improves the flexibility and adaptability of the anomaly detection mechanism, achieving accurate detection of abnormal database behavior.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data security technology, and in particular to a method, device, medium, and product for detecting log anomalies based on a large model. Background Technology

[0002] In database security and compliance management, internal operations and development personnel possess high operational privileges, and their database actions directly impact data security. To prevent data leaks, malicious damage, and operational errors, comprehensive audits and anomaly detection of internal personnel's database operations are necessary to ensure data security and compliance requirements.

[0003] Existing database operation anomaly detection mainly falls into two technical paths: one is rule and strategy template-based matching detection, which performs string matching, regular expression comparison, or simple parse tree matching on the syntactic structure and literal content of structured query statements by pre-configuring detection rules; the other is sequence or feature anomaly detection based on traditional deep learning models, which transforms operation statements into numerical vectors and feature sequences through feature engineering, uses deep learning models to model historical normal operations, and judges anomalies based on error or probability thresholds.

[0004] Existing detection technologies have significant drawbacks: rule-based methods suffer from semantic blind spots and high false alarm rates; rules are static and rigid, resulting in high maintenance costs and a lack of personalized behavior differentiation capabilities; traditional deep learning-based methods suffer from insufficient semantic representation, a semantic gap between features and business intent, difficulty in dynamically updating model baselines, and a lack of interpretability in alarm results, leading to low efficiency in security operations and problem localization. Summary of the Invention

[0005] This invention provides a log anomaly detection method, device, medium, and product based on a large model to solve the problems of the database operation anomaly detection process being unable to dynamically understand the true operational intentions of operation and maintenance personnel from a semantic level, the rigidity of the detection mechanism, insufficient adaptive capability, and the lack of interpretability of alarm information.

[0006] According to one aspect of the present invention, a log anomaly detection method based on a large model is provided, comprising: Obtain the target query log from the database audit system, and parse the structured field data in the target query log that matches the target database behavior of the target query user; Structured field data is populated into a pre-built prompt word template to obtain contextual semantic prompt words for the target query log. The prompt word template is used to guide the large model to analyze the operational intent, data sensitivity, and business scenario rationality of the target database behavior as a database security expert. The contextual semantic hints of the target query log are input into the large model for semantic analysis. After obtaining the structured analysis text that matches the target query log, a semantic vector matching the structured analysis text is generated. Obtain the target user dynamic baseline that matches the target query user. In a user dynamic baseline, a central semantic vector, operation intent distribution, normal time pattern, frequently accessed objects, and infrequently accessed objects are defined for a user under various high-frequency operation intents. Based on the structured field data, the semantic vector, and the target user dynamic baseline, an anomaly detection report matching the target query log is generated.

[0007] According to another aspect of the present invention, a log anomaly detection device based on a large model is provided, comprising: The log acquisition module is used to obtain the target query log from the database audit system and parse the structured field data in the target query log that matches the target database behavior of the target query user. The prompt word generation module is used to populate structured field data into a pre-built prompt word template to obtain contextual semantic prompt words for the target query log. The prompt word template is used to guide the large model to analyze the operational intent, data sensitivity, and business scenario rationality of the target database behavior as a database security expert. The log analysis module is used to input the contextual semantic prompts of the target query log into the large model for semantic analysis processing. After obtaining the structured analysis text that matches the target query log, it generates a semantic vector that matches the structured analysis text. The dynamic baseline acquisition module is used to acquire the target user dynamic baseline that matches the target query user. In a user dynamic baseline, a central semantic vector, operation intent distribution, normal time pattern, frequently accessed objects and infrequently accessed objects are defined for a user under various high-frequency operation intents. The detection report generation module is used to generate an anomaly detection report that matches the target query log based on the structured field data, the semantic vector, and the target user dynamic baseline.

[0008] According to another aspect of the present invention, an electronic device is provided, the electronic device comprising: At least one processor; and A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the log anomaly detection method based on a large model as described in any embodiment of the present invention.

[0009] According to another aspect of the present invention, a computer-readable storage medium is provided, the computer-readable storage medium storing computer instructions, the computer instructions being configured to cause a processor to execute and implement the log anomaly detection method based on a large model as described in any embodiment of the present invention.

[0010] According to another aspect of the present invention, a computer program product is also provided, including a computer program that, when executed by a processor, implements the steps of the method as described in any embodiment of the present invention.

[0011] This invention, through the aforementioned technical solution, can obtain target query logs from a database auditing system and parse them to obtain structured field data matching the target database behavior of the target query user. A pre-constructed prompt word template guides a large-scale model to analyze the operational intent, data sensitivity, and business scenario rationality of the target database behavior as a database security expert. The parsed structured field data is then filled into the prompt word template to obtain contextual semantic prompt words for the target query logs. These contextual semantic prompt words are input into the large-scale model for semantic analysis to obtain structured analysis text. Further processing of the structured analysis text by the large-scale model yields corresponding semantic vectors. After obtaining a dynamic baseline of the target user reflecting the central semantic vector, operational intent distribution, normal time patterns, frequently accessed objects, and infrequently accessed objects under various high-frequency operational intents, anomaly detection is performed by combining the obtained structured field data and semantic vectors. Finally, an anomaly detection report matching the target query logs is generated. This technical solution enables a deep and dynamic understanding of the target query user's operational intent at the semantic level, improving the flexibility and adaptability of the anomaly detection mechanism, effectively addressing complex and hidden internal data security threats, and achieving accurate detection of abnormal database operations.

[0012] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description

[0013] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0014] Figure 1 This is a flowchart of a log anomaly detection method based on a large model according to Embodiment 1 of the present invention; Figure 2 This is a flowchart of a log anomaly detection method based on a large model according to Embodiment 2 of the present invention; Figure 3 This is a schematic diagram of a log anomaly detection device based on a large model according to Embodiment 3 of the present invention; Figure 4 This is a schematic diagram of the structure of an electronic device that implements the log anomaly detection method based on a large model according to an embodiment of the present invention. Detailed Implementation

[0015] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0016] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0017] Example 1 Figure 1 This is a flowchart of a log anomaly detection method based on a large model, provided in Embodiment 1 of the present invention. This embodiment is applicable to situations where abnormal user operations within a system are detected based on database audit logs. This method can be executed by a log anomaly detection device based on a large model. This device can be implemented in hardware and / or software and is generally configured in the computer performing database log anomaly detection. Figure 1 As shown, the method includes: S110. Obtain the target query log from the database audit system, and parse the structured field data in the target query log that matches the target database behavior of the target query user.

[0018] The target query user refers to the user currently being detected for abnormal behavior based on database audit log information. The target database behavior refers to the operations performed by the target query user on a specific target database, as recorded in the database audit log information. The target query log refers to log data in the database audit log information that contains information about the operations performed by the target query user on the target database.

[0019] Understandably, when performing anomaly detection on a series of operations performed by a target query user on the database system, the target query log matching the target query user can be obtained from the database audit system. The target query log can then be parsed to obtain structured field data representing the target query user's series of actions performed on the target database. The parsed structured field data can include user information of the person performing the operation, the operation execution time, and the operation execution status.

[0020] S120. Populate the structured field data into the pre-built prompt word template to obtain the contextual semantic prompt words of the target query log.

[0021] The prompt word template is used to guide the large model to analyze the target database's operational intent, data sensitivity, and business scenario rationality as a database security expert.

[0022] Among them, contextual semantic prompts can refer to semantic reference information used to characterize the target database behavior association scenarios and behavioral characteristics of the target query user, which can enable large models to understand the true intent of the target database behavior.

[0023] Understandably, in order to enable large models to more accurately understand the target database behavior of target query users, structured field data matching the target database behavior can be populated into pre-built prompt word templates. The pre-built prompt word templates can include structured fields that provide the data information needed to analyze the operational intent, data sensitivity, and business scenario rationality of the target database behavior. This allows the large model to receive the incoming contextual semantic prompt words and, as a database security expert, accurately analyze the target database behavior of target query users.

[0024] S130. Input the contextual semantic prompts of the target query log into the large model for semantic analysis processing. After obtaining the structured analysis text that matches the target query log, generate a semantic vector that matches the structured analysis text.

[0025] Structured analysis text refers to standardized text with a fixed logical structure, formed after contextual semantic prompts describing the behavior of a target database are input into a large model, and then subjected to feature extraction, format regularization, and standardization. Semantic vectors refer to multi-dimensional numerical vectors obtained by transforming the deep semantic information of structured analysis text through semantic mapping, used to quantify the semantic connotation and intent features of the target database's behavior.

[0026] Understandably, after obtaining the contextual semantic hints describing the target database behavior in the target query logs, these can be input into a large model for semantic analysis. This yields structured analysis text in a standardized format that describes the operational intent, data sensitivity, and business scenario rationality of the target database behavior. To further improve the processing efficiency of the structured analysis text matching the target database behavior, the generated structured analysis text can be further processed to obtain corresponding semantic vectors, representing the various data information and operational characteristics of the target database behavior in a quantitative form.

[0027] S140. Obtain the dynamic baseline of the target user that matches the target query user.

[0028] Among them, a user dynamic baseline defines a user's central semantic vector, operation intention distribution, normal time pattern, frequently accessed objects, and infrequently accessed objects under various high-frequency operation intentions.

[0029] The central semantic vector can be a typical semantic feature vector obtained by clustering or fusing multiple historical semantic vectors matching the target query user. It is used to characterize the overall core intent and common semantics of the target query user's historical database behavior. The operation intent distribution can be the probability set of various operation intents appearing in the target query user's historical database behavior under a preset intent category. The target user dynamic baseline can be a set of benchmark features constructed based on the central semantic vector, operation intent distribution, normal time pattern, frequently accessed objects, and infrequently accessed objects of the target query user under various high-frequency operation intents. It is used to characterize the target query user's normal behavior pattern and is dynamically updated over time.

[0030] Understandably, when performing anomaly detection on the target database behavior of the current target query user, it is necessary to refer to the feature set used to describe the target query user's historical normal behavior. This allows for the acquisition of a dynamic baseline of the target user that matches the target query user. The behavioral feature information obtained from the target user's dynamic baseline can include: the user's central semantic vector under each high-frequency operation intent, the distribution of operation intent, normal time patterns, frequently accessed objects, and infrequently accessed objects, etc. This can fully characterize the concentration of operation behavior, the range of operation behavior triggering time, and the distribution pattern of operation behavior access objects in the user's historical normal behavior features, and can serve as a detection reference information for judging whether the target database behavior of the current target query user is abnormal.

[0031] S150. Generate an anomaly detection report that matches the target query log based on structured field data, semantic vectors, and the target user's dynamic baseline.

[0032] Understandably, the structured field data in the target query log that matches the target database behavior of the target query user contains feature information of the target query user's operation behavior to be detected. Anomaly detection can be performed based on normal time patterns, frequently accessed objects, and infrequently accessed objects in the target user's dynamic baseline. The semantic vector obtained by the large model and matched with the structured analysis text can reflect the operation intent, data sensitivity, and business scenario rationality of the target database behavior. Anomaly detection can be performed by referring to the central semantic vector and operation intent distribution of the target query user under various high-frequency operation intents in the target user's dynamic baseline. Semantic interpretation processing can be performed based on the anomaly detection results, thereby generating an anomaly detection report that matches the target query log in an easy-to-understand manner.

[0033] This invention, through the aforementioned technical solution, can obtain target query logs from a database auditing system and parse them to obtain structured field data matching the target database behavior of the target query user. A pre-constructed prompt word template guides a large-scale model to analyze the operational intent, data sensitivity, and business scenario rationality of the target database behavior as a database security expert. The parsed structured field data is then filled into the prompt word template to obtain contextual semantic prompt words for the target query logs. These contextual semantic prompt words are input into the large-scale model for semantic analysis to obtain structured analysis text. Further processing of the structured analysis text by the large-scale model yields corresponding semantic vectors. After obtaining a dynamic baseline of the target user reflecting the central semantic vector, operational intent distribution, normal time patterns, frequently accessed objects, and infrequently accessed objects under various high-frequency operational intents, anomaly detection is performed by combining the obtained structured field data and semantic vectors. Finally, an anomaly detection report matching the target query logs is generated. This technical solution enables a deep and dynamic understanding of the target query user's operational intent at the semantic level, improving the flexibility and adaptability of the anomaly detection mechanism, effectively addressing complex and hidden internal data security threats, and achieving accurate detection of abnormal database operations.

[0034] Example 2 Figure 2 This is a flowchart of a log anomaly detection method based on a large model, provided in Embodiment 2 of the present invention. This embodiment is a further specification based on the above embodiments, including: specific methods for parsing and processing target query logs, and specific methods for generating anomaly detection reports using a large model. Figure 2 As shown, the method includes: S210. Obtain the target query log from the database audit system, and parse the structured field data in the target query log that matches the target database behavior of the target query user.

[0035] Optionally, parse structured field data from the target query log that matches the target database behavior of the target query user, including: The field values ​​corresponding to each structured field are parsed from the target query log by regular expression matching, and the structured fields are combined with the matched field values ​​to obtain the structured field data. The structured fields include: action timestamp, target query user, client Internet Protocol address, query expression, and query execution time.

[0036] Specifically, the structured fields describing the target database behavior of the target query user can include: behavior operation timestamp, target query user, client Internet Protocol address, query expression, and query execution time. When parsing the target query log to obtain the field values ​​corresponding to each structured field, regular expressions can be used to match the predefined field values ​​corresponding to each structured field. For example, the target query log obtained from the database audit system is: 2023-10-26 02:15:30 | User: zhangsan | Client IP: 10.10.1.100 | Database: prod_customer_db | SQL: SELECT customer_id, name, phone, email, credit_card_last4FROM customer_info WHERE 1=1 ORDER BY customer_id LIMIT 0, 1000; | Execution time: 1200ms. Extract the field values ​​corresponding to the structured fields. The resulting structured field data can be {"timestamp": "2023-10-26 02:15:30", "username": "zhangsan", "client_ip": "10.10.1.100", "database": "prod_customer_db", "sql_statement": "SELECT customer_id, name, phone, email, credit_card_last4 FROM customer_info WHERE 1=1 ORDER BY customer_id LIMIT 0, 1000;", "execution_time_ms": 1200}.

[0037] S220. Populate the structured field data into the pre-built prompt word template to obtain the contextual semantic prompt words of the target query log.

[0038] The prompt word template is used to guide the large model to analyze the target database's operational intent, data sensitivity, and business scenario rationality as a database security expert.

[0039] Optionally, structured field data can be populated into a pre-built suggestion word template to obtain contextual semantic suggestions for the target query log, including: Obtain a pre-built prompt word template, which includes: an identity definition area defining the large model as a database security expert, an information filling area, an analysis content limitation area, and an analysis result format limitation area; wherein, the analysis content limitation area includes: first descriptive information guiding the large model to select from multiple alternative operation intentions, second descriptive information guiding the large model to select from multiple alternative data sensitivity types, third descriptive information guiding the large model to score the rationality of the business scenario within a set scoring range, and fourth descriptive information guiding the large model to describe the risks; The structured field data is filled into the information filling area of ​​the prompt word template to obtain the contextual semantic prompt words of the target query log.

[0040] Specifically, the first descriptive information can refer to the semantic descriptive information defined in the analysis content limitation area of ​​the prompt word template to guide the large model in determining the operational intent. The second descriptive information can refer to the semantic descriptive information defined in the analysis content limitation area of ​​the prompt word template to guide the large model in determining the data sensitivity type. The third descriptive information can refer to the semantic descriptive information defined in the analysis content limitation area of ​​the prompt word template to guide the large model in scoring the rationality of the business scenario. The fourth descriptive information can refer to the semantic descriptive information defined in the analysis content limitation area of ​​the prompt word template to guide the large model in describing risks.

[0041] Understandably, to enable the large model to better handle structured field data matching target database behavior according to requirements, the pre-built prompt word template can include: defining the large model as an identity definition area for database security experts, an information filling area, an analysis content limitation area, and an analysis result format limitation area. This clarifies the data content relied upon by the large model during inference and determines the inference focus by setting the role positioning of the large model during inference. Specifically, the analysis content limitation area can define a first descriptive information to guide the large model to select one from multiple alternative operation intentions, a second descriptive information to guide the large model to select one from multiple alternative data sensitivity types, a third descriptive information to guide the large model to score the rationality of the business scenario within a set scoring range, and a fourth descriptive information to guide the large model to describe risks. For example, the constructed prompt word template is: {Please analyze the following operation as a database security expert: User: {username} (role: application developer); Time: {timestamp}; Client: {client_ip}; Database: {database}; SQL (structured query statement): {sql_statement}; Please analyze: {1) Main operation intent (single choice): [data inspection, fault diagnosis, business analysis, data extraction, system maintenance, permission testing, other]; 2) Sensitivity of the data involved (single choice): [public, internal, sensitive, highly sensitive]; 3) Reasonableness of the business scenario (0-10 points, 10 points is the most reasonable); 4) One-sentence risk description;} Please output in JSON (key-value pair structured data) format. The following is defined in the context of the analysis: "Please analyze the following operations as a database security expert" is in the identity definition area for defining the large model as a database security expert; "User: {username} (role: application developer); Time: {timestamp}; Client: {client_ip}; Database: {database}; SQL: {sql_statement};" is in the information filling area; "1) Main operation intent (single choice): [data inspection, fault diagnosis, business analysis, data extraction, system maintenance, permission testing, other];" is the first descriptive information defined in the analysis content limitation area; "2) Sensitivity of the data involved (single choice): [public, internal, sensitive, highly sensitive];" is the second descriptive information defined in the analysis content limitation area; "3) Reasonableness of the business scenario (0-10 points, 10 points is the most reasonable);" is the third descriptive information defined in the analysis content limitation area; "4) One-sentence risk description;" is the fourth descriptive information defined in the analysis content limitation area; "Please output in JSON (key-value pair structured data) format" is defined in the analysis result format limitation area.

[0042] S230. Input the contextual semantic prompts of the target query log into the large model for semantic analysis processing. After obtaining the structured analysis text that matches the target query log, generate a semantic vector that matches the structured analysis text.

[0043] Optionally, in the analysis result format limitation area of ​​the prompt word template, the defined analysis result format is key-value pair format.

[0044] Understandably, in the analysis result format limitation area of ​​the prompt word template, the format of the output results of structured field data in large model analysis can be defined. If the output format of the analysis results is defined as a key-value pair format in the prompt word template, the unstructured semantic analysis results can be transformed into standardized structured data through the fixed mapping relationship of "attribute identifier-attribute value", and the core information can be quickly extracted without complex text parsing logic.

[0045] Optionally, the contextual semantic hints of the target query log are input into the large model for semantic analysis. After obtaining the structured analysis text that matches the target query log, a semantic vector matching the structured analysis text is generated, including: Input the contextual semantic prompts of the target query logs into the big model for semantic analysis processing, and obtain the operation intent, data sensitivity, business scenario rationality and risk description output by the big model in the form of key-value pairs as structured analysis text; The structured analysis text is input into the semantic vector encoder to obtain a semantic vector that matches the structured analysis text.

[0046] Among them, the semantic vector encoder can refer to an algorithm module that can map structured analytical text representing database behavior into high-dimensional fixed-length numerical vectors.

[0047] Understandably, the large model performs semantic analysis based on the contextual semantic prompts in the input target query logs. This results in a structured analysis text, defined in the contextual semantic prompts and output in key-value pairs. This text includes the operational intent, data sensitivity, business scenario rationality, and risk description. For example, the structured analysis text in key-value pairs might be: {"Operational Intent": "Data Extraction", "Data Sensitivity": "Highly Sensitive", "Business Scenario Rationality": 2, "Risk Description": "During the early morning hours, developers performed unconditional batch pagination queries on the core customer table containing personal identification and payment information from a non-business server"}. The structured analysis text is input into the semantic vector encoder to obtain matching semantic vectors. For example, the semantic vector obtained after processing the structured analysis text is V_current = [0.87, -0.23, 0.45, -0.67, 0.12, ...]. This vector is a 768-dimensional numerical vector, where each real number in each dimension corresponds to a feature dimension of the target database behavior of the target query user. The overall distribution of the 768-dimensional values ​​can completely and accurately quantify the deep semantic connotation and core intent of the input structured analysis text, providing a numerical basis for subsequent similarity comparison with the central semantic vector in the target user's dynamic baseline.

[0048] S240. Obtain the dynamic baseline of the target user that matches the target query user.

[0049] Among them, a user dynamic baseline defines a user's central semantic vector, operation intention distribution, normal time pattern, frequently accessed objects, and infrequently accessed objects under various high-frequency operation intentions.

[0050] Optionally, obtain a dynamic baseline of target users matching the target query user, including: Check if the local database contains a dynamic baseline of the user's database behavior corresponding to the target query user; If it exists, use the dynamic baseline of that user as the target user dynamic baseline for matching the target query user; If it does not exist, retrieve the target query logs within a preset historical period from the database audit system, and process the target query logs to obtain the central semantic vector, operation intent distribution, normal time pattern, commonly used access objects, and unused access objects of the target query user under each high-frequency operation intent, and construct the target user dynamic baseline for matching the target query user.

[0051] The preset historical duration can refer to a pre-defined time interval used to extract historical target database behavior data of the target query user in order to construct a dynamic baseline of the target user.

[0052] Understandably, anomaly detection of the target database behavior of a target query user requires referencing the target user dynamic baseline that matches the target query user. When obtaining the target user dynamic baseline, one can first check if the local database stores a user dynamic baseline corresponding to the target database behavior of the target query user. If it exists, this user dynamic baseline can be directly used as the target user dynamic baseline. If it does not exist, it is necessary to obtain the target query logs within a preset historical period from the database auditing system, and process the structured field data of normal behavior operations in the target query logs. This yields the central semantic vector, operation intent distribution, normal time pattern, commonly used access objects, and infrequently used access objects of the target database behavior of the target query user under various high-frequency operation intentions. This allows the construction of the target user dynamic baseline matching the target query user. For example, if the local database does not store the target user dynamic baseline, a preset historical period of 30 days can be set, and the target query logs from the database auditing system for the past 30 days can be obtained. After processing, the target user dynamic baseline could be: {"User ID": "zhangsan", "Role": "Application Developer", "Last Update Time": "2023-10-25 18:30:00", "Semantic Vector Cluster Statistics":} {"Cluster Center 1": {"Main Intent": "Troubleshooting", "Vector Center": [0.12, 0.34, -0.56, ...], "Number of Samples": 45}, "Cluster Center 2": {"Main Intent": "Business Analysis", "Vector Center": [-0.23, 0.67, 0.12, ...], "Number of Samples": 28}}, "Intent Distribution": {"Troubleshooting": 0.62, "Business Analysis": 0.35, "Data Inspection": 0.03}, "Normal Time Pattern": {"Weekday Active Periods": ["09:00-12:00", "14:00-18:00"], "Non-Working Hour Operation Ratio": 0.5}, "Frequently Accessed Objects": ["order_table", "product_table", "user_log_table"], "Unfrequently Used Objects": ["customer_info", "payment_table", "salary_table"]}.

[0053] S250. Calculate the semantic space deviation based on the semantic vector and the central semantic vector in the target user's dynamic baseline, and determine the target user's semantic pattern state by setting a preset semantic deviation threshold.

[0054] The semantic space deviation refers to the distance or difference between the semantic vector matched by the current target query user and the central semantic vector in the target user's dynamic baseline within the semantic space. It can be used to quantify the deviation of the semantic vector matching the target database behavior of the current target query user from the user's normal behavior pattern. The preset semantic deviation threshold is a pre-defined critical value used to determine whether the semantic vector matching the target database behavior of the target query user is abnormal. The target user's semantic pattern state refers to the semantic space deviation anomaly detection result corresponding to the current target query user.

[0055] Specifically, when performing anomaly detection on the target database behavior of a target query user, the cosine similarity can be obtained by calculating the cosine of the angle between the semantic vector and the central semantic vector in the target user's dynamic baseline in the semantic space. The cosine similarity characterizes the deviation in the semantic space, and a preset semantic deviation threshold is used to determine whether the current target user's semantic pattern is abnormal compared to the normal semantic pattern in the target user's dynamic baseline. For example, the range of the cosine similarity between semantic vectors is [-1, 1], with a higher similarity indicating a closer value to 1. The two central semantic vectors in the target user's dynamic baseline are: "Semantic Vector Cluster Statistics": {"Cluster Center 1": {"Main Intent": "Troubleshooting", "Vector Center": [0.12, 0.34, -0.56, ...], "Sample Count": 45}, "Cluster Center 2": {"Main Intent": "Business Analysis", "Vector Center": [-0.23, 0.67, 0.12, ...], ...} "Number of Samples": 28}} After calculating the cosine similarity of the semantic vectors matched with the current target query user, we can obtain that the cosine similarity to cluster center 1 (troubleshooting) is 0.15 and the cosine similarity to cluster center 2 (business analysis) is 0.18. The preset semantic deviation threshold is set to 0.8. Since the two calculated cosine similarities are both lower than the preset semantic deviation threshold, it is determined that the semantic pattern of the current target user is abnormal compared with the normal semantic pattern in the target user dynamic baseline.

[0056] S260. Based on the distribution of operational intentions in the target user's dynamic baseline, determine the probability of sudden changes in the target user's operational intentions in the structured field data.

[0057] The probability of a sudden change in the target user's operational intent can be defined as the probability that the current target user's operational intent deviates significantly from the distribution of historical operational intents in the target user's dynamic baseline.

[0058] Specifically, the contextual semantic prompts obtained by populating the structured field data of the target query logs are subjected to semantic analysis by a large model to determine the current operational intent of the target query user. Based on the distribution of operational intents in the target user's dynamic baseline, the probability of abrupt changes in the target query user's current operational intent can be determined, thereby detecting whether an anomaly has occurred. For example, if the distribution of intents in the target user's dynamic baseline is: {"Troubleshooting": 0.62, "Business Analysis": 0.35, "Data Inspection": 0.03}, and the target query user's current operational intent is "Data Extraction," meaning that this intent has never appeared in the target user's dynamic baseline and belongs to a newly added intent type, with a significant distribution difference and an operational intent mutation probability of less than 1%, it indicates that the target query user's current operational intent has significantly deviated from the target user's dynamic baseline, and is judged as an intent pattern anomaly.

[0059] S270. Obtain the target user's behavior operation timestamp and access object information from the structured field data, and verify the context anomaly verification result based on the normal time pattern, frequently used access objects, and infrequently used access objects in the target user's dynamic baseline.

[0060] Among them, the context anomaly verification result can refer to the judgment conclusion obtained by verifying the compliance and rationality of the context information of the current target query user's behavior operation, which is used to determine whether the target query user's behavior operation has abnormal features at the context level.

[0061] Understandably, the structured field data in the target query log that matches the target database behavior of the target query user can include contextual information such as the timestamp of the target query user's action, accessed object information, and query operation mode. This contextual anomaly can be checked based on the normal time pattern, frequently accessed objects, and infrequently accessed objects in the target user's dynamic baseline. For example, if the timestamp of the target query user's action is 02:15 AM, and the normal time pattern in the target user's dynamic baseline is 09:00-18:00, then the action time can be determined to be abnormal. If the target query user's current accessed object "customer_info" is marked as an "infrequently used object" in the target user's dynamic baseline, then the accessed object can be determined to be abnormal. If the target query user is currently using the "WHERE 1=1" unconditional query model, combined with "LIMIT 0, 1000" to implement pagination to split and display a large batch of query results into fixed numbers, this is a typical data export pattern, and therefore the query pattern can be determined to be abnormal.

[0062] S280. Input the target user's semantic pattern state, the probability of sudden change in the target user's operation intention, and the context anomaly verification results into the large model to obtain an anomaly detection report.

[0063] Specifically, based on the target user's semantic pattern state, the probability of sudden changes in the target user's operational intent, and the contextual anomaly verification results obtained above, an anomaly detection report can be constructed to generate prompt words. These prompt words are then input into a large-scale model to generate an anomaly detection report that accurately reflects the anomaly detection results of the current target user's various behavioral operations and is highly readable. For example, the constructed prompt words could be: {Please make a comprehensive risk assessment based on the following information: User: Zhang San (application developer); Current operation: At 2 AM, an unconditional pagination query was performed on the customer_info table from 10.10.1.100, with the intent of "data extraction"; Personal baseline characteristics: Usually performs troubleshooting and business analysis during working hours, and has never accessed the customer information table; Multi-dimensional detection results: High semantic deviation, novel intent, and both time and object are abnormal; Question: Does this operation pose an internal threat risk? What is the risk level? Please briefly explain your reasoning.} The final anomaly detection report is as follows: {“Risk Assessment”: “High Risk”, “Confidence Level”: 0.92, “Reason”: “This operation deviates significantly from the user’s historical behavior baseline in multiple dimensions: 1) Abnormal time (non-working hours); 2) Abrupt change in intent (from technical operation to data extraction); 3) Abnormal access target (first access to a highly sensitive table); 4) Suspicious query pattern (unconditional pagination export). Based on this, it is judged to be a suspected data theft behavior.”} Based on the above embodiments, after generating an anomaly detection report matching the target query log according to structured field data, semantic vectors, and the target user's dynamic baseline, the process includes: The incremental learning mechanism integrates new normal behavior patterns from structured field data in the target query log that match the target database behavior of the target query user into the target user's dynamic baseline. A time decay strategy is adopted to reduce the weight of behaviors outside the preset historical time in the target user dynamic baseline, and the target user dynamic baseline is dynamically iterated and updated.

[0064] Incremental learning can refer to a learning strategy that gradually integrates structured field data matching the target database behavior of newly added target query users into the model training process without retraining the overall large model, so as to dynamically update the user behavior baseline and adapt to changes in user operation patterns. Time decay strategy can refer to a strategy that assigns different weights to historical user operation data according to the time of data generation (the earlier the data, the lower the weight; the more recent the data, the higher the weight), so as to weaken the impact of outdated data on the user behavior baseline and strengthen recent behavioral characteristics.

[0065] Understandably, when performing anomaly detection on the target database behavior of a target query user, if normal behavior exists in addition to the abnormal behavior, or if the detection result shows no anomalies, an incremental learning mechanism can be used to integrate new normal behavior patterns from structured field data matching the target database behavior of the target query user into the target user dynamic baseline. Additionally, a time decay strategy can be used to reduce the weight of behaviors outside a preset historical period in the target user dynamic baseline, completing the iterative update of the target user dynamic baseline. For example, if the preset historical period is set to 30 days, the target user dynamic baseline can choose to retain only user behavior information within 30 days, or assign a weight coefficient that decays exponentially over time to each historical operation, setting smaller weight coefficients for earlier samples and larger weight coefficients for more recent samples. This allows the target user dynamic baseline to dynamically weaken the influence of outdated behaviors and continuously update with greater weight for the latest user behaviors.

[0066] This invention, through the aforementioned technical solution, can acquire target query logs and parse them to obtain structured field data matching the target database behavior of the target query user. The parsed structured field data is then filled into a prompt word template to obtain contextual semantic prompt words for the target query logs. These prompt words are input into a large model for semantic analysis to obtain structured analysis text. The large model further processes this text to obtain corresponding semantic vectors. The semantic space deviation is calculated based on the semantic vectors and the central semantic vector in the target user's dynamic baseline. The probability of sudden changes in the target user's operational intent is obtained based on the distribution of operational intent in the target user's dynamic baseline. Furthermore, the normal time pattern, frequently accessed objects, and infrequently accessed objects in the target user's dynamic baseline are verified to obtain contextual anomaly verification results. Finally, an anomaly detection report is generated by combining all the above anomaly detection results, and the target user's dynamic baseline is dynamically updated based on the current normal operational behavior of the target query user. This technical solution employs multi-dimensional anomaly detection and introduces a large model to integrate semantic vectors and structured field data information for comprehensive risk reasoning, simulating an expert decision-making process, effectively improving the accuracy and reliability of identifying abnormal behavior operations in the internal database.

[0067] Example 3 Figure 3 This is a schematic diagram of a log anomaly detection device based on a large model, provided in Embodiment 3 of the present invention. Figure 3 As shown, the device includes: a log acquisition module 310, a prompt word generation module 320, a log analysis module 330, a dynamic baseline acquisition module 340, and a detection report generation module 350.

[0068] The log acquisition module 310 is used to acquire target query logs from the database audit system and parse structured field data in the target query logs that match the target database behavior of the target query user.

[0069] The prompt word generation module 320 is used to populate structured field data into a pre-built prompt word template to obtain contextual semantic prompt words for the target query log. The prompt word template is used to guide the large model to analyze the operational intent, data sensitivity, and business scenario rationality of the target database behavior as a database security expert.

[0070] The log analysis module 330 is used to input the contextual semantic prompts of the target query log into the large model for semantic analysis processing. After obtaining the structured analysis text that matches the target query log, it generates a semantic vector that matches the structured analysis text.

[0071] The dynamic baseline acquisition module 340 is used to acquire the target user dynamic baseline that matches the target query user. In a user dynamic baseline, a central semantic vector, operation intent distribution, normal time pattern, frequently accessed objects, and infrequently accessed objects are defined for a user under various high-frequency operation intents.

[0072] The detection report generation module 350 is used to generate an anomaly detection report that matches the target query log based on structured field data, semantic vectors, and the target user's dynamic baseline.

[0073] This invention, through the aforementioned technical solution, can obtain target query logs from a database auditing system and parse them to obtain structured field data matching the target database behavior of the target query user. A pre-constructed prompt word template guides a large-scale model to analyze the operational intent, data sensitivity, and business scenario rationality of the target database behavior as a database security expert. The parsed structured field data is then filled into the prompt word template to obtain contextual semantic prompt words for the target query logs. These contextual semantic prompt words are input into the large-scale model for semantic analysis to obtain structured analysis text. Further processing of the structured analysis text by the large-scale model yields corresponding semantic vectors. After obtaining a dynamic baseline of the target user reflecting the central semantic vector, operational intent distribution, normal time patterns, frequently accessed objects, and infrequently accessed objects under various high-frequency operational intents, anomaly detection is performed by combining the obtained structured field data and semantic vectors. Finally, an anomaly detection report matching the target query logs is generated. This technical solution enables a deep and dynamic understanding of the target query user's operational intent at the semantic level, improving the flexibility and adaptability of the anomaly detection mechanism, effectively addressing complex and hidden internal data security threats, and achieving accurate detection of abnormal database operations.

[0074] Optionally, the log acquisition module 310 can be specifically used to: parse the field values ​​corresponding to each structured field in the target query log through regular expression matching, and combine each structured field with the matched field value to obtain structured field data, wherein the structured fields include: behavior operation timestamp, target query user, client Internet Protocol address, query expression, and query execution time.

[0075] Optionally, the prompt word generation module 320 can be specifically used to: obtain a pre-built prompt word template, which includes: an identity definition area defining the large model as a database security expert, an information filling area, an analysis content limitation area, and an analysis result format limitation area. The analysis content limitation area includes: first descriptive information guiding the large model to select from multiple alternative operational intentions; second descriptive information guiding the large model to select from multiple alternative data sensitivity types; third descriptive information guiding the large model to score the rationality of the business scenario within a set scoring range; and fourth descriptive information guiding the large model to describe the risks. Structured field data is then filled into the information filling area of ​​the prompt word template to obtain the contextual semantic prompt words for the target query log.

[0076] Optionally, the log analysis module 330 can be specifically used to: input the contextual semantic prompts of the target query logs into the large model for semantic analysis processing, and obtain the operation intent, data sensitivity, business scenario rationality, and risk description output by the large model in key-value pair form as structured analysis text. The structured analysis text is then input into the semantic vector encoder to obtain a semantic vector that matches the structured analysis text.

[0077] Optionally, the dynamic baseline acquisition module 340 can be specifically used to: query whether a user dynamic baseline corresponding to the target database behavior of the target query user is stored in the local database; if it exists, use the user dynamic baseline as the target user dynamic baseline for matching the target query user. If it does not exist, retrieve the target query log within a preset historical time period from the database audit system, and process the target query log to obtain the central semantic vector, operation intent distribution, normal time pattern, commonly used access objects, and unused access objects of the target query user under various high-frequency operation intentions, and construct the target user dynamic baseline for matching the target query user.

[0078] Optionally, the detection report generation module 350 can be specifically used for: calculating the semantic space deviation based on the semantic vector and the central semantic vector in the target user's dynamic baseline, and determining the target user's semantic pattern state through a preset semantic deviation threshold; determining the probability of sudden changes in the target user's operational intent in the structured field data based on the distribution of operational intent in the target user's dynamic baseline; obtaining the target query user's behavior operation timestamps and access object information in the structured field data, and verifying them based on the normal time pattern, frequently used access objects, and infrequently used access objects in the target user's dynamic baseline to obtain the context anomaly verification result; and inputting the target user's semantic pattern state, the probability of sudden changes in the target user's operational intent, and the context anomaly verification result into the large model to obtain an anomaly detection report.

[0079] Optionally, a dynamic baseline update module may also be included, which incorporates new normal behavior patterns from structured field data in the target query log that match the target database behavior of the target query user into the target user's dynamic baseline through an incremental learning mechanism. A time decay strategy is employed to reduce the weight of behaviors outside the preset historical time frame in the target user's dynamic baseline, enabling dynamic iterative updates to the target user's dynamic baseline.

[0080] The log anomaly detection device based on a large model provided in this embodiment of the invention can execute the log anomaly detection method based on a large model provided in any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

[0081] Example 4 Figure 4 A schematic diagram of an electronic device 10, which can be used to implement embodiments of the present invention, is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.

[0082] like Figure 4As shown, the electronic device 10 includes at least one processor 11 and a memory, such as a read-only memory (ROM) 12 or a random access memory (RAM) 13, communicatively connected to the at least one processor 11. The memory stores computer programs executable by the at least one processor. The processor 11 can perform various appropriate actions and processes based on the computer program stored in the ROM 12 or loaded from storage unit 18 into the RAM 13. The RAM 13 can also store various programs and data required for the operation of the electronic device 10. The processor 11, ROM 12, and RAM 13 are interconnected via a bus 14. An input / output (I / O) interface 15 is also connected to the bus 14.

[0083] Multiple components in electronic device 10 are connected to I / O interface 15, including: input unit 16, such as keyboard, mouse, etc.; output unit 17, such as various types of displays, speakers, etc.; storage unit 18, such as disk, optical disk, etc.; and communication unit 19, such as network card, modem, wireless transceiver, etc. Communication unit 19 allows electronic device 10 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0084] Processor 11 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. Processor 11 performs the various methods and processes described above, such as log anomaly detection methods based on large models.

[0085] That is, obtain the target query log from the database audit system, and parse the structured field data in the target query log that matches the target database behavior of the target query user; Structured field data is populated into a pre-built prompt word template to obtain contextual semantic prompt words for the target query log. The prompt word template is used to guide the large model to analyze the operational intent, data sensitivity, and business scenario rationality of the target database behavior as a database security expert. The contextual semantic hints of the target query log are input into the large model for semantic analysis. After obtaining the structured analysis text that matches the target query log, a semantic vector matching the structured analysis text is generated. Obtain the target user dynamic baseline that matches the target query user. In a user dynamic baseline, a central semantic vector, operation intent distribution, normal time pattern, frequently accessed objects, and infrequently accessed objects are defined for a user under various high-frequency operation intents. Based on structured field data, semantic vectors, and the target user's dynamic baseline, an anomaly detection report matching the target query log is generated.

[0086] In some embodiments, the large-model-based log anomaly detection method can be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program can be loaded and / or installed on electronic device 10 via ROM 12 and / or communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the large-model-based log anomaly detection method described above can be performed. Alternatively, in other embodiments, processor 11 can be configured to perform the large-model-based log anomaly detection method by any other suitable means (e.g., by means of firmware).

[0087] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0088] Computer programs used to implement the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the processor, the computer programs cause the functions / operations specified in the flowcharts and / or block diagrams to be performed. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0089] In the context of this invention, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination thereof. Alternatively, a computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0090] To provide interaction with a user, the systems and techniques described herein can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0091] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or middleware components (e.g., application servers), or frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.

[0092] A computing system can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system to address the shortcomings of traditional physical hosts and VPS services, such as high management difficulty and weak business scalability.

[0093] It should be understood that the various forms of processes shown above can be used, with steps reordered, added, or deleted. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this invention can be achieved, and this is not limited herein.

[0094] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.

Claims

1. A log anomaly detection method based on a large model, characterized in that, include: Obtain the target query log from the database audit system, and parse the structured field data in the target query log that matches the target database behavior of the target query user; Structured field data is populated into a pre-built prompt word template to obtain contextual semantic prompt words for the target query log. The prompt word template is used to guide the large model to analyze the operational intent, data sensitivity, and business scenario rationality of the target database behavior as a database security expert. The contextual semantic hints of the target query log are input into the large model for semantic analysis. After obtaining the structured analysis text that matches the target query log, a semantic vector matching the structured analysis text is generated. Obtain the target user dynamic baseline that matches the target query user. In a user dynamic baseline, a central semantic vector, operation intent distribution, normal time pattern, frequently accessed objects, and infrequently accessed objects are defined for a user under various high-frequency operation intents. Based on the structured field data, the semantic vector, and the target user dynamic baseline, an anomaly detection report matching the target query log is generated.

2. The method according to claim 1, characterized in that, Parse the structured field data in the target query log that matches the target database behavior of the target query user, including: The field values ​​corresponding to each structured field are parsed from the target query log by regular expression matching, and the structured fields are combined with the matched field values ​​to obtain the structured field data. The structured fields include: action timestamp, target query user, client Internet Protocol address, query expression, and query execution time.

3. The method according to claim 1, characterized in that, The structured field data is populated into a pre-built suggestion word template to obtain contextual semantic suggestions for the target query log, including: Obtain a pre-built prompt word template, which includes: an identity definition area defining the large model as a database security expert, an information filling area, an analysis content limitation area, and an analysis result format limitation area; wherein, the analysis content limitation area includes: first descriptive information guiding the large model to make a single selection from multiple alternative operation intentions, second descriptive information guiding the large model to make a single selection from multiple alternative data sensitivity types, third descriptive information guiding the large model to score the rationality of the business scenario within a set scoring range, and fourth descriptive information guiding the large model to describe the risks; The structured field data is filled into the information filling area of ​​the prompt word template to obtain the contextual semantic prompt words of the target query log.

4. The method according to claim 3, characterized in that, In the analysis result format limitation area of ​​the prompt word template, the defined analysis result format is a key-value pair format; Accordingly, the contextual semantic hints of the target query log are input into the large model for semantic analysis. After obtaining the structured analysis text that matches the target query log, a semantic vector matching the structured analysis text is generated, including: The contextual semantic prompts of the target query logs are input into the big model for semantic analysis processing. The operation intent, data sensitivity, business scenario rationality, and risk description output by the big model in the form of key-value pairs are obtained as structured analysis text. The structured analysis text is input into the semantic vector encoder to obtain a semantic vector that matches the structured analysis text.

5. The method according to any one of claims 1-4, characterized in that, Obtain the dynamic baseline of target users that match the target query users, including: Check if the local database contains a dynamic baseline of the user's target database behavior corresponding to the target query user; If it exists, use the dynamic baseline of that user as the target user dynamic baseline for matching the target query user; If it does not exist, retrieve the target query logs within a preset historical period from the database audit system, and process the target query logs to obtain the central semantic vector, operation intent distribution, normal time pattern, commonly used access objects, and unused access objects of the target query user under each high-frequency operation intent, and construct the target user dynamic baseline for matching the target query user.

6. The method according to any one of claims 1-4, characterized in that, Based on the structured field data, the semantic vector, and the target user dynamic baseline, an anomaly detection report matching the target query log is generated, including: The semantic space deviation is calculated based on the semantic vector and the central semantic vector in the target user's dynamic baseline, and the semantic pattern state of the target user is determined by a preset semantic deviation threshold. Based on the distribution of operational intentions in the target user's dynamic baseline, determine the probability of sudden changes in the target user's operational intentions in the structured field data; Obtain the target user's behavior operation timestamp and access object information from the structured field data, and verify the context anomaly verification result based on the normal time pattern, frequently used access objects, and infrequently used access objects in the target user's dynamic baseline. The target user's semantic pattern state, the probability of sudden changes in the target user's operational intent, and the context anomaly verification results are input into the large model to obtain an anomaly detection report.

7. The method according to any one of claims 1-6, characterized in that, After generating an anomaly detection report matching the target query log based on the structured field data, the semantic vector, and the target user dynamic baseline, the report includes: The incremental learning mechanism integrates new normal behavior patterns from structured field data in the target query log that match the target database behavior of the target query user into the target user's dynamic baseline. A time decay strategy is adopted to reduce the weight of behaviors outside the preset historical time in the target user dynamic baseline, and the target user dynamic baseline is dynamically iterated and updated.

8. An electronic device, characterized in that, The electronic device includes: At least one processor; and A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the log anomaly detection method based on a large model as described in any one of claims 1-7.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that cause a processor to execute the log anomaly detection method based on a large model as described in any one of claims 1-7.

10. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by a processor, implements the log anomaly detection method based on a large model according to any one of claims 1-7.