A blockchain audit information screening method and system, an electronic device, and a medium

By constructing a multi-dimensional index structure and generating audit event feature vectors, and dynamically filtering thresholds, the problem of low accuracy in information filtering in blockchain auditing is solved, achieving efficient and accurate audit information filtering and report generation.

CN122240622APending Publication Date: 2026-06-19NANJING ANXIN IOT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING ANXIN IOT TECH CO LTD
Filing Date
2026-04-11
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing blockchain auditing methods are ill-suited to complex and ever-changing security incident scenarios, resulting in low accuracy in audit information screening and a tendency to miss important information.

Method used

A multi-dimensional audit data index structure is constructed. By indexing operation timestamps, subject identifiers, and behavior types, audit event feature vectors are generated, information correlation is calculated, and filtering thresholds are dynamically determined to form an audit trajectory chain and generate a structured report.

Benefits of technology

It improves the accuracy and efficiency of blockchain audit information screening, can adapt to complex and ever-changing security event scenarios, reduces redundant information interference, and ensures the integrity and readability of audit information.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240622A_ABST
    Figure CN122240622A_ABST
Patent Text Reader

Abstract

A method, system, electronic device, and medium for filtering blockchain audit information are disclosed, relating to the field of data processing technology. The method includes: firstly, acquiring raw audit logs containing operator identifiers, object identifiers, behavior types, and timestamps, and constructing index structures based on these elements in three dimensions: time, subject-object relationship, and behavior type. Upon receiving a user's security event description, the system generates an event feature vector, used to retrieve a set of relevant candidate logs from the index structure. By calculating the information correlation between the feature vector and the candidate logs, and combining the mean and standard deviation of the correlation, a dynamic filtering threshold is determined to filter out highly relevant target logs. Finally, the filtering results are organized into an audit trajectory chain according to a time series, forming a structured analysis report. Implementing the technical solution provided in this application can improve the accuracy of blockchain audit information filtering.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing technology, specifically to a blockchain audit information screening method, system, electronic device, and medium. Background Technology

[0002] With the widespread application of blockchain technology in finance, supply chain, and government affairs, its security and auditability are receiving increasing attention. Blockchain systems generate massive amounts of audit logs during operation, recording various operational behaviors and transaction information, which are crucial for tracing security incidents and determining responsibility.

[0003] Currently, blockchain auditing typically employs a log retrieval method based on fixed rules. Auditors need to retrieve relevant records from the audit log database according to fixed search criteria for filtering and analysis.

[0004] However, in practical applications, this simple retrieval method is insufficient to meet the auditing needs of large-scale blockchain systems. The retrieval process requires traversing a large amount of log data, and this rule-based retrieval method is ill-suited to complex and ever-changing security event scenarios, easily leading to the omission of important audit information and thus reducing the accuracy of blockchain audit information filtering. Summary of the Invention

[0005] This application provides a method, system, electronic device, and medium for filtering blockchain audit information, which can improve the accuracy of filtering blockchain audit information.

[0006] Firstly, this application provides a method for filtering blockchain audit information, including: Obtain the original audit logs from the blockchain network, wherein the original audit logs include the operator entity identifier, the operator object identifier, the operation behavior type, and the operation timestamp; A first index is constructed based on the operation timestamp, a second index is constructed based on the operation subject identifier and the operation object identifier, and a third index is constructed based on the operation behavior type. An audit data index structure of the first index, the second index and the third index is generated. Receive security event description information input by the user, and generate an audit event feature vector based on the security event description information; Based on the audit event feature vector, a candidate audit log set is retrieved from the audit data index structure, and the information correlation degree between the audit event feature vector and each log in the candidate audit log set is calculated; Obtain the mean and standard deviation of the information correlation in the candidate audit log set, and determine the dynamic screening threshold based on the mean and the standard deviation; Audit logs with information relevance greater than the dynamic filtering threshold are selected as target logs, and a preset number of target logs are selected as the filtering results in descending order of information relevance. The filtered results are arranged by timestamp to form an audit trajectory chain, and a structured filtering report containing the audit trajectory chain is output.

[0007] By adopting the above technical solution, a multi-dimensional audit data index structure is formed by constructing a first index, a second index, and a third index based on the operation timestamp, the operation subject identifier, the operation object identifier, and the operation behavior type, respectively. This improves the retrieval efficiency of audit logs. Simultaneously, by converting user-input security event descriptions into audit event feature vectors and calculating the information correlation between these feature vectors and candidate audit logs, and dynamically determining the filtering threshold using the mean and standard deviation, target logs highly relevant to security events can be adaptively filtered. Furthermore, sorting the filtering results in descending order of information correlation and selecting a preset number of target logs effectively avoids interference from redundant information. Finally, by arranging the target logs by timestamp to form an audit trajectory chain and generating a structured filtering report, both the integrity of audit information and the readability of audit results are ensured. This solution overcomes the limitations of traditional fixed-rule retrieval methods, enabling more flexible responses to complex and ever-changing security event scenarios and improving the accuracy of blockchain audit information filtering.

[0008] Optionally, the operation timestamps are segmented according to a preset time granularity, and an index entry containing all audit log identifiers within each time period is created to form the first index; a hash mapping structure is used to store the mapping relationship between the operation subject identifier and the operation object identifier, and an association list is created based on all operation records associated with the mapping relationship to form the second index; the operation behavior types are classified into data access type, permission change type, and system configuration type, and an index entry containing all operation record identifiers under each classified type is created to form the third index; the first index, the second index, and the third index are associated and mapped to construct an audit data index structure reflecting the time dimension, entity dimension, and behavior dimension.

[0009] Optionally, Bloom filters are created for the first index, the second index, and the third index respectively, and a Bloom filter combination structure is constructed based on each Bloom filter; an audit data hash table is established using the hash value of the operation timestamp, the combined hash value of the operation subject identifier and the operation object identifier, and the hash value of the operation behavior type as composite keys; the Bloom filter combination structure is cross-mapped with the audit data hash table to generate a distributed audit data index structure.

[0010] Optionally, lexical analysis is performed on the security event description information to obtain event type keywords, entity identifier keywords, time range keywords, and impact range keywords; the event type keywords are mapped to event type codes, the entity identifier keywords are converted into standardized entity identifiers, the time range keywords are parsed into time intervals, and the impact range keywords are mapped to impact object type codes; the event type codes, the standardized entity identifiers, the time intervals, and the impact object type codes are structurally encapsulated to obtain an audit event feature vector.

[0011] Optionally, the audit event feature vector is decomposed into time feature components, subject relationship feature components, and behavior feature components; a time window range is determined in the first index based on the time feature components, and a first candidate log subset that meets preset time constraints is selected according to the time window range; subject-object association matching is performed in the second index based on the subject relationship feature components, and a second candidate log subset that meets preset subject relationship constraints is selected from the first candidate log subset; behavior pattern matching is performed in the third index based on the behavior feature components, and a third candidate log subset that meets preset behavior feature constraints is selected from the second candidate log subset; the third candidate log subset is deduplicated and its integrity is verified to generate the candidate audit log set.

[0012] Optionally, the target logs in the filtering results are sorted chronologically by timestamp to generate a log sequence; the association between the operation subject identifier and operation object identifier between adjacent target logs in the log sequence is identified to determine continuous operation characteristics; based on the continuous operation characteristics, the log sequence is divided into several operation segments, wherein each operation segment contains target logs that are temporally continuous and operationally related; a segment identifier and an association pointer between the preceding and following segments are added to each operation segment, and the operation segments are chained together in chronological order to generate the audit trajectory chain.

[0013] Optionally, based on the operation behavior type and operation frequency of each operation segment in the audit trajectory chain, the abnormality degree of the corresponding operation segment is calculated; operation segments with an abnormality degree exceeding a preset abnormality threshold are marked as abnormal operation segments; relevant operation records of the abnormal operation segments within a preset time range are retrieved from the original audit log to form abnormal behavior context information; and the abnormal behavior context information is integrated to generate an abnormal warning report.

[0014] A second aspect of this application provides a blockchain audit information filtering system, the system comprising: The log acquisition module is used to acquire the original audit logs in the blockchain network. The original audit logs include the operation subject identifier, the operation object identifier, the operation behavior type, and the operation timestamp. An index structure generation module is used to construct a first index based on the operation timestamp, a second index based on the operation subject identifier and the operation object identifier, a third index based on the operation behavior type, and generate an audit data index structure of the first index, the second index and the third index; The filtering threshold determination module is used to receive security event description information input by the user, and generate an audit event feature vector based on the security event description information; retrieve a candidate audit log set from the audit data index structure based on the audit event feature vector, and calculate the information correlation degree between the audit event feature vector and each log in the candidate audit log set; obtain the mean and standard deviation of the information correlation degree in the candidate audit log set, and determine a dynamic filtering threshold based on the mean and the standard deviation; The filtering report determination module is used to filter audit logs whose information relevance is greater than the dynamic filtering threshold as target logs, and select a preset number of target logs as filtering results in descending order of information relevance; arrange the filtering results by timestamp to form an audit trajectory chain, and output a structured filtering report containing the audit trajectory chain.

[0015] A third aspect of this application provides an electronic device including a memory, a processor, and a program stored in the memory and executable on the processor, the program being loaded and executed by the processor to implement a blockchain audit information filtering method.

[0016] A fourth aspect of this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, causes the processor to implement a blockchain audit information filtering method.

[0017] In summary, one or more technical solutions provided in this application have at least the following technical effects or advantages: By adopting the above technical solution, a multi-dimensional audit data index structure is formed by constructing a first index, a second index, and a third index based on the operation timestamp, the operation subject identifier, the operation object identifier, and the operation behavior type, respectively. This improves the retrieval efficiency of audit logs. Simultaneously, by converting user-input security event descriptions into audit event feature vectors and calculating the information correlation between these feature vectors and candidate audit logs, and dynamically determining the filtering threshold using the mean and standard deviation, target logs highly relevant to security events can be adaptively filtered. Furthermore, sorting the filtering results in descending order of information correlation and selecting a preset number of target logs effectively avoids interference from redundant information. Finally, by arranging the target logs by timestamp to form an audit trajectory chain and generating a structured filtering report, both the integrity of audit information and the readability of audit results are ensured. This solution overcomes the limitations of traditional fixed-rule retrieval methods, enabling more flexible responses to complex and ever-changing security event scenarios and improving the accuracy of blockchain audit information filtering. Attached Figure Description

[0018] Figure 1 This is a flowchart illustrating a blockchain audit information screening method provided in an embodiment of this application; Figure 2 This is another flowchart illustrating a blockchain audit information screening method provided in this application embodiment; Figure 3 This is a schematic diagram of the structure of a blockchain audit information screening system provided in an embodiment of this application; Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0019] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments.

[0020] In the description of the embodiments of this application, the words "for example" or "for instance" are used to indicate examples, illustrations, or explanations. Any embodiment or design that is described as "for example" or "for instance" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design options. Rather, the use of the words "for example" or "for instance" is intended to present the relevant concepts in a specific manner.

[0021] In the description of the embodiments of this application, the term "multiple" means two or more. For example, multiple systems means two or more systems, and multiple screen terminals means two or more screen terminals. Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the indicated technical features. Thus, a feature defined with "first" or "second" may explicitly or implicitly include one or more of that feature. The terms "comprising," "including," "having," and variations thereof all mean "including but not limited to," unless otherwise specifically emphasized.

[0022] This application provides a method for filtering blockchain audit information. In one embodiment, please refer to... Figure 1 , Figure 1 This is a flowchart illustrating the blockchain audit information filtering method provided in this application embodiment. This method can be implemented using a computer program, which can be integrated into an application or run as a standalone utility application. The method can also be implemented using a microcontroller or run on a blockchain audit information filtering system based on the von Neumann architecture. Specifically, the method may include the following steps: Step 101: Obtain the original audit log in the blockchain network. The original audit log includes the operator identifier, the operator object identifier, the operation behavior type, and the operation timestamp.

[0023] A blockchain network is a distributed ledger system composed of multiple nodes. Each node maintains the same copy of the data and ensures data consistency through a consensus mechanism. The raw audit log is an automatic record of operations made during the operation of the blockchain system. These records are stored in the form of structured data across the nodes and are used to track and verify all operations that occur in the system. The operation subject identifier is the unique identifier of the entity initiating the operation, typically represented in a blockchain system as a user account address, smart contract address, or system service identifier. The operation object identifier is the unique identifier of the target entity being operated on, including the accessed data object, the invoked smart contract, or the modified system configuration item. The operation behavior type refers to the specific category of the operation, such as predefined behavior categories like data reading, data writing, contract invocation, and permission changes. The operation timestamp is the precise point in time when the operation occurred, usually recorded in Unix timestamp format, accurate to the millisecond level.

[0024] Specifically, raw audit logs are acquired through audit log collectors deployed on each node of the blockchain network. First, a log collection module is installed on each full node and light node of the blockchain network. This module captures real-time operational data by monitoring the RPC interface and event listeners of the blockchain nodes. When any operation occurs in the blockchain network, such as transaction execution, smart contract calls, or state changes, the node automatically generates corresponding log records. The log collector extracts standardized four-tuple information by parsing block data, transaction data, and event logs: the operation subject identifier is obtained by parsing the transaction's `from` field or the event's initiator field; the operation object identifier is obtained by parsing the transaction's `to` field, contract address, or data object identifier; the operation behavior type is determined by analyzing the transaction type, function call signature, and opcode; and the operation timestamp is directly obtained from the timestamp field in the block header or the transaction confirmation time. To ensure data integrity, the system employs a multi-node synchronous verification mechanism, collecting log data from multiple nodes within the same time period and verifying the accuracy of the logs through hash value comparison and data consistency checks. The collected raw audit logs are stored in JSON format. Each record contains four required fields as well as optional extended fields such as gas consumption and transaction fees. Data compression algorithms are used to reduce storage space usage, and a backup mechanism is established to ensure that data is not lost.

[0025] Step 102: Construct a first index based on the operation timestamp, a second index based on the operation subject identifier and the operation object identifier, and a third index based on the operation behavior type, and generate an audit data index structure of the first, second and third indexes.

[0026] The first index is a time-based data retrieval structure that segments audit logs by timestamp. The second index is a mapping structure based on entity relationships, establishing the association between the operating entity and the operating object. The third index is a behavior-based index structure that categorizes and stores logs according to operation type. The audit data index structure is a composite retrieval system formed by associating and mapping these three independent indexes.

[0027] Specifically, the audit data index structure is generated through a three-stage index construction and association mapping. The first stage constructs the time index: the system sets a preset time granularity of 1 hour, dividing the 86,400 seconds within 24 hours into 24 time periods, each corresponding to an index bucket. For the timestamp 1698765432123, its corresponding hour identifier hour_id = (timestamp / 3600)%24 is first calculated, and then the unique identifier log_id of that log is added to the index bucket of the corresponding time period, forming the time index table time_index[hour_id] = [log_id1, log_id2, ...]. The second stage constructs the entity relationship index: a hash mapping structure HashMap is used. <String,List <string>The storage entity-object relationship uses the operation entity identifier as the key and the list of operation object identifiers as the value. When processing log records, a composite key `subject_object_key = hash(subject_id + "_" + object_id)` is calculated, and a mapping relationship is established between this composite key and the corresponding operation record identifier. Simultaneously, an inverted index `object_subject_map` is maintained for bidirectional queries. The third stage constructs a behavior type index: predefined behavior type classification rules divide operation behaviors into three categories: data access (READ, query), permission change (grant, revoke), and system configuration (config, deploy). An independent index bucket `behavior_index[category] = [log_id_list]` is created for each category. Finally, the three indexes are linked using a cross-mapping algorithm: a composite index table `composite_index` is created, using the time segment identifier, entity-object composite key, and behavior type as triple keys (`time_segment`, `entity_pair`, `behavior_type`). An array of pointers to specific log records is established, enabling multi-dimensional fast retrieval capabilities.

[0028] Step 103: Receive security event description information input by the user, and generate an audit event feature vector based on the security event description information.

[0029] Security event description information is a textual description of the security event to be queried, entered by the user in natural language. Audit event feature vectors are numerical vector representations converted from natural language descriptions and processed by computers.

[0030] Specifically, the system converts user descriptions into standardized feature vectors using natural language processing and feature encoding techniques. First, the system receives security event description text input by the user through a web interface or API. Regular expressions and a lexical analyzer are used to preprocess the text, removing stop words and punctuation. Then, keyword recognition is performed: a pre-trained named entity recognition model is used to extract time-range keywords. The time parsing library dateutil converts "October 22, 2023, morning" into the time interval [1698753600, 1698796800]. The regular expression "\b0x[a-fA-F0-9]{4, 40}\b" is used to extract entity identifier keywords such as "0x742d," and these are expanded into a complete address using an address completion algorithm. Finally, a predefined dictionary is used to match event type keywords, mapping "abnormal privilege escalation" to event type code 2. Next, feature vector construction is performed: the time interval is converted into a center timestamp, center_time = (start_time + end_time) / 2; the hash value of the entity identifier is calculated, entity_hash = SHA256(entity_id) % 10000; and the event type is directly mapped to a numerical code. Finally, a standardized feature vector is generated: feature_vector = [center_time, entity_hash, event_type_code, impact_scope], where impact_scope is calculated by analyzing keywords related to the scope of influence. To ensure the numerical stability of the vector, the features of each dimension are normalized: time features are scaled by dividing by a time constant, hash values ​​are limited to a reasonable range using modulo, and type codes retain their original values.

[0031] In one possible implementation, an audit event feature vector is generated based on the security event description information, specifically including steps 1031-1033, as follows: Step 1031: Perform lexical analysis on the security incident description information to obtain event type keywords, entity identifier keywords, time range keywords, and impact range keywords.

[0032] Lexical analysis and recognition is the process of breaking down natural language text into lexical units and identifying their grammatical categories. Event type keywords are words describing the nature of a security event, such as "privilege escalation" or "data breach." Entity identifier keywords are identifiers pointing to specific objects of operation, such as the account address "0x742d35Cc." Time range keywords are words indicating the time when the event occurred, such as "yesterday morning" or "2023-10-22." Impact scope keywords are words describing the degree and scope of the event's impact, such as "entire system" or "single user."

[0033] Specifically, keyword extraction is achieved through a multi-level lexical analyzer and regular expression matching. First, the system uses the jieba word segmentation library to segment the input text and generate a vocabulary list. Then, parallel recognition of four types of keywords is performed: event type keyword recognition uses a predefined dictionary for matching, which includes mapping relationships such as "privilege_escalation" and "data_breach", and uses the longest matching algorithm to find matching items in the word segmentation results; entity identifier keyword recognition uses the regular expression "\b0x[a-fA-F0-9]{4,42}\b" to match blockchain address formats, and "\b[A-Za-z0-9]{26,35}\b" to match Bitcoin address formats; time range keyword recognition uses a time entity recognizer, uses the regular expression "\d{4}-\d{2}-\d{2}" to match date formats, and uses a relative time dictionary to match relative time expressions such as "yesterday" and "last week"; impact scope keyword recognition uses a scope dictionary to match words indicating broad impact such as "system-wide|global|entire" and words indicating limited impact such as "single|individual|local". During the recognition process, the system assigns a confidence score to each keyword, calculated based on the word's matching degree in a predefined dictionary and its contextual relevance. The final output includes four categories of keyword lists, each accompanied by its location information and confidence score in the original text.

[0034] Step 1032: Map event type keywords to event type codes, convert entity identifier keywords to standardized entity identifiers, parse time range keywords into time intervals, and map impact range keywords to impact object type codes.

[0035] Event type encoding is a method of converting event type keywords into numerical representations. Standardized entity identifiers are the process of unifying entity identifiers from various formats into a standard format. Time ranges are numerical representations of time range keywords, converted into specific timestamp ranges. Affected object type encoding is a classification system that converts descriptions of the scope of impact into numerical codes.

[0036] Specifically, keyword standardization is achieved through a predefined mapping table and algorithmic conversion. Event type encoding adopts a hierarchical encoding system: events are first divided into three main categories: access (1xx), permission (2xx), and system (3xx). Then, each main category is further subdivided into subtypes, such as permission escalation (201) and permission demotion (202). During conversion, the system looks up the event type mapping table `event_type_map["permission escalation"]=201` to directly obtain the corresponding code. Standardized entity identifier processing consists of two steps: address completion and format unification. For incomplete Ethereum addresses such as "0x742d", the system queries the address database to find a complete address matching the prefix and uses an address verification algorithm to verify its validity. For address formats from different blockchains, they are uniformly converted to hexadecimal format and a network identifier prefix is ​​added. Time interval parsing employs a time calculation engine: relative time terms are calculated using the current time, such as converting "yesterday afternoon" to the afternoon time interval [13:00, 18:00] within the range of current_time - 86400 seconds; absolute time is directly parsed from ISO format and converted to a Unix timestamp; fuzzy time is expanded using a time window, with a single day expanded to a 24-hour interval. The encoding of the affected object type uses an impact range assessment algorithm: the impact level is calculated based on quantifiers and range terms in the keywords, with "whole|all" corresponding to high impact level code 3, "part|local" corresponding to medium impact level code 2, and "single|individual" corresponding to low impact level code 1.

[0037] Step 1033: Perform structured encapsulation of event type encoding, standardized entity identifier, time interval, and affected object type encoding to obtain the audit event feature vector.

[0038] Structured encapsulation is the process of organizing different types of coded data into a unified data structure. An audit event feature vector is a multi-dimensional numerical vector containing the event's features across various dimensions.

[0039] Specifically, the final audit event feature vector is generated through vector construction algorithms and data standardization. First, the system defines the standard structure of the feature vector as a four-dimensional vector [event_type, entity_hash, time_center, impact_level]. The event type dimension directly uses the event type encoding value; the entity identifier dimension is processed using a hash algorithm, calculating entity_hash = CRC32(standard_entity_id) % 65536, converting the string into a 16-bit integer; the time dimension uses the center value of the time interval, time_center = (start_timestamp + end_timestamp) / 2, ensuring the representativeness of the time feature; the impact scope dimension directly uses the type encoding of the affected object. To ensure the comparability of the values ​​in each dimension of the vector, the system performs feature normalization: the event type encoding is divided by the maximum type encoding value for scaling; the entity hash value maintains its original range [0, 65535]; the time center value minus the baseline timestamp is divided by the time constant 86400 for day standardization; and the impact level encoding is divided by the maximum impact level of 3 for normalization. The final generated audit event feature vector is a four-dimensional array of type float, with each dimension's value controlled within the range of [0, 1], facilitating subsequent similarity calculations and machine learning processing. The system also records metadata information for the feature vector, including original keywords, confidence scores, and conversion timestamps, for result interpretation and debugging analysis.

[0040] Step 104: Retrieve the candidate audit log set from the audit data index structure based on the audit event feature vector, and calculate the information correlation between the audit event feature vector and each log in the candidate audit log set.

[0041] The candidate audit log set is a collection of log records that are initially filtered from the audit data index structure and are relevant to the query conditions. Information relevance is a numerical metric that measures the similarity between the audit event feature vector and the candidate log records.

[0042] Specifically, the candidate set is obtained and the relevance is evaluated through multi-level retrieval algorithms and similarity calculations. First, the system decomposes the audit event feature vector [event_type, entity_hash, time_center, impact_level] into three retrieval conditions: time condition time_range=[time_center-time_window, time_center+time_window], where time_window=3600 seconds; entity condition entity_condition=entity_hash and its hash collision range [entity_hash-100, entity_hash+100]; and behavior condition behavior_condition=event_type and its related type encoding. Then, a three-stage retrieval is performed: The first stage uses the first index to perform a time range retrieval, calculating the time segment identifier `time_segment_id = time_center / 3600`, and obtaining the candidate log ID list `candidate_ids_time` from the time index `time_index[time_segment_id]` and adjacent time segments. The second stage uses the second index to perform an entity association retrieval, using the entity hash value to find related operation records in `entity_index`, obtaining the candidate log ID list `candidate_ids_entity`. The third stage uses the third index to perform a behavior type retrieval, searching for operation records of the same type in `behavior_index` based on the event type code, obtaining the candidate log ID list `candidate_ids_behavior`. Finally, the set intersection operation `candidate_ids = candidate_ids_time ∩ candidate_ids_entity ∩ candidate_ids_behavior` is performed to obtain the final candidate audit log set.For calculating information relevance, the system constructs a corresponding feature vector log_vector[i]=[log_event_type, log_entity_hash, log_timestamp, log_impact] for each candidate log entry. Then, it calculates the relevance using the weighted cosine similarity formula: similarity=(w1×v1×l1+w2×v2×l2+w3×v3×l3+w4×v4×l4) / (sqrt(w1²×v1²+w2²×v2²+w3²×v3²+w4²×v4²)×sqrt(w1²×l1²+w2²×l2²+w3²×l3²+w4²×l4²)), where the weights w1=0.3, w2=0.25, w3=0.25, and w4=0.2 correspond to the importance of event type, entity identifier, time, and scope of impact, respectively. Finally, the system outputs a set of candidate audit log entries and a list of their corresponding information relevance scores.

[0043] In one possible implementation, a candidate audit log set is retrieved from the audit data index structure based on audit event feature vectors, specifically including steps 1041-1043, as follows: Step 1041: Decompose the audit event feature vector into time feature components, subject relationship feature components, and behavioral feature components; determine the time window range in the first index based on the time feature components, and filter out the first candidate log subset that meets the preset time constraints according to the time window range.

[0044] The time feature component is the numerical part of the audit event feature vector representing the time dimension information. The subject relationship feature component is the feature part representing the relationship between the operating subject and the operating object. The behavior feature component is the feature part representing the type and pattern of the operation behavior. The time window range is the time interval extending forward and backward from the time feature component. The first candidate log subset is the preliminary screening result that meets the time constraints.

[0045] Specifically, the first round of candidate log filtering is achieved through feature vector decomposition and time index query. First, the system extracts components from the four-dimensional audit event feature vector [event_type, entity_hash, time_center, impact_level]: time feature component time_feature=time_center, subject relationship feature component entity_feature=entity_hash, behavior feature component behavior_feature=event_type, and impact range feature impact_feature=impact_level as auxiliary features. Then, the time window range is determined based on the time feature components: the base time window size base_window=3600 seconds is calculated, and the window size adjusted_window=base_window×(1+impact_feature×0.5) is adjusted according to the impact range feature to generate the time window range [time_center-adjusted_window, time_center+adjusted_window]. Next, a time range query is performed in the first index: the start and end timestamps of the time window are converted into corresponding time segment identifiers start_segment=start_time / 3600, end_segment=end_time / 3600. All time segment indices from start_segment to end_segment are traversed, collecting a list of log identifiers within each time segment. The system uses a parallel query algorithm to access multiple time segment indices simultaneously: forsegment_idinrange(start_segment, end_segment+1): candidate_logs.extend(time_index[segment_id]). To improve query efficiency, bitmap indexing technology is used, representing the existence of logs in each time segment as a bit vector, and quickly determining the candidate log range through bitwise operations. Finally, the collected log identifiers are precisely filtered by timestamp: the complete record of each candidate log is read, its operation timestamp is verified to be within the time window range, a first candidate log subset first_candidate_subset is generated, and the subset size and query time are recorded for performance monitoring.

[0046] Step 1042: Based on the subject relationship feature components, perform subject-object association matching in the second index, and select the second candidate log subset that meets the preset subject relationship constraints from the first candidate log subset.

[0047] Subject-object association matching is the process of searching for operation records related to subject-relationship feature components in the second index. Predefined subject-relationship constraints are predefined rules governing allowed association relationships between subjects and objects. The second candidate log subset is a set of logs that further satisfy the subject-relationship constraints based on the first candidate log subset.

[0048] Specifically, log filtering based on the subject relationship dimension is achieved through hash mapping queries and relationship constraint verification. First, the system performs a correlation query in the second index based on the subject relationship feature component `entity_feature`: using the entity hash value as the query key, it searches for the list of related operation record identifiers `related_operations=subject_object_map[entity_feature]` in the subject-object mapping table `subject_object_map`, and simultaneously searches for records in the reverse mapping table `object_subject_map` where the entity is the operation object `reverse_operations=object_subject_map[entity_feature]`. Then, a relationship extension query is performed: for each found related entity, its second-degree relationship is further queried, constructing a set of related entities `extended_entities`, and a graph traversal algorithm is used to find all related nodes of depth 2 in the entity relationship graph. Next, subject relationship constraint verification is performed: subject relationship constraint rules are defined, including direct relationships (operating subject directly acts on operating object), indirect relationships (relationships generated through intermediate entities), and time-series relationships (related operations within a specific time series). For each log record in the first candidate log subset, extract its subject identifier (subject_id) and object identifier (object_id). Calculate the subject hash value (subject_hash = hash(subject_id)) and the object hash value (object_hash = hash(object_id)). Check if the relationship constraint is satisfied: (subject_hash == entity_feature) OR (object_hash == entity_feature) OR (subject_hash INextended_entities) OR (object_hash INextended_entities). The system uses a Bloom filter to accelerate relationship matching: pre-add all relevant entity hash values ​​to the Bloom filter (bloom_filter) for fast pre-screening of candidate logs, reducing the computational overhead of exact matching. Finally, a second candidate log subset (second_candidate_subset) is generated, recording screening statistics including the distribution of matched relationship types and the screening ratio.

[0049] Step 1043: Perform behavioral pattern matching in the third index based on behavioral feature components, and select a third candidate log subset that meets the preset behavioral feature constraints from the second candidate log subset; perform deduplication and integrity verification on the third candidate log subset to generate a candidate audit log set.

[0050] Behavioral pattern matching is the process of searching for operation behavior records in the third index that match the behavioral feature components. Predefined behavioral feature constraints are predefined matching rules for operation behavior types and patterns. The third candidate log subset is the final set of candidate logs that satisfy the behavioral feature constraints. Deduplication is the process of eliminating duplicate log records. Integrity verification is the process of verifying the integrity and consistency of log record data.

[0051] Specifically, the final candidate audit log set is generated through behavior category matching, data deduplication, and integrity verification. First, the system performs a behavior pattern query in the third index based on the behavior feature component `behavior_feature`: mapping event type codes to behavior categories `category = behavior_category_map[behavior_feature]`, and searching for the list of operation record identifiers of the same category, `same_category_logs`, in the behavior index `behavior_index[category]`. Then, behavior pattern extension matching is performed: defining behavior similarity rules, calculating the numerical distance between behavior codes `similar_behaviors = [bforbinbehavior_codesifabs(b-behavior_feature)<=behavior_threshold]`, where `behavior_threshold = 5` represents the allowed range of behavior code differences, and searching for log records with similar behavior patterns. Next, behavioral constraint filtering is performed on the second candidate log subset: Each candidate log record is traversed, its operation behavior type `log_behavior_type` is extracted, and it is checked whether it meets the behavioral constraint condition `log_behavior_typeIN(same_category_logsUNIONsimilar_behaviors)`. Logs that meet the condition are added to the third candidate log subset. Then, deduplication is performed: a unique identifier `log_signature=hash(operation_subject+operation_object+operation_type+operation_timestamp)` is defined, and a hash set `HashSet` is used to record the processed log signatures. The third candidate log subset is traversed, skipping log records with duplicate signatures, and retaining the record with the earliest timestamp as the representative. Finally, integrity verification is performed: the integrity of the required fields of each log record is verified, the validity of the field format (such as timestamp range, address format) is checked, the integrity of the digital signature or hash value of the log record is verified, and the checksum algorithm `checksum=CRC32(log_content)` is used to verify that the data has not been tampered with. Log records that fail integrity verification are marked as invalid and removed from the candidate set. Finally, a candidate audit log set, candidate_audit_logs, is generated, recording statistical information including the total number, the number of duplicates, the number of invalid records, and the distribution of each behavior type, which is used for subsequent correlation calculation and result analysis.

[0052] Step 105: Obtain the mean and standard deviation of the information correlation in the candidate audit log set, and determine the dynamic screening threshold based on the mean and standard deviation.

[0053] The information relevance mean is the arithmetic mean of the relevance scores between all log records in the candidate audit log set and the query feature vector. The standard deviation is a statistic that measures the dispersion of relevance scores, reflecting the central tendency of the data distribution. The dynamic filtering threshold is a critical value for relevance scores adaptively determined based on statistical characteristics, used to filter highly relevant log records.

[0054] Specifically, dynamic screening criteria are generated through statistical calculations and an adaptive threshold determination algorithm. First, the system collects the information relevance score for each log record in the candidate audit log set, forming a relevance score array containing the total number of candidate logs. Then, the mean is calculated: using the arithmetic mean formula, the mean equals the sum of all scores divided by the total number of logs. The relevance array is iterated through, and all scores are accumulated until the total score is 0. This process is repeated for each score, accumulating the total score until the mean is reached. Finally, the mean is calculated as the total score divided by the number of candidate logs. Next, the standard deviation is calculated: first, the variance is calculated as the sum of the squares of the differences between each score and the mean divided by the total number of logs. Specifically, the cumulative variance is 0. The system iterates through each score, calculates the square of its difference from the mean, and accumulates these squares. The variance is then calculated as the cumulative variance divided by the number of candidate logs. Finally, the standard deviation is calculated as the square root of the variance. To ensure the stability of numerical calculations, the system employs the Welford online algorithm for incremental statistical calculations: Initializing the count to 0, the mean to 0, and the cumulative squared differences to 0, for each new correlation score, the system performs the following operations: incrementing the count by 1, calculating the difference (score minus mean), summing the mean to the difference divided by the count, and summing the cumulative squared differences to the difference multiplied by the score minus the new mean. The final variance is calculated as the cumulative squared differences divided by the count. Then, a dynamic screening threshold is determined based on statistical characteristics: using outlier detection methods from statistics, the threshold calculation formula is set as the threshold equal to the mean plus an adjustment coefficient multiplied by the standard deviation, where the adjustment coefficient is an adjustable parameter. The system dynamically adjusts the adjustment coefficient value according to the distribution characteristics of the candidate log set: when the standard deviation is small (less than 0.1), the adjustment coefficient is set to 0.5; when the standard deviation is moderate (0.1 ≤ standard deviation ≤ 0.2), the adjustment coefficient is set to 0.7; and when the standard deviation is large (greater than or equal to 0.2), the adjustment coefficient is set to 1.0. To prevent an excessively high threshold from resulting in too few filtered results, the system sets an upper threshold equal to the mean plus two standard deviations and a lower threshold equal to the mean minus 0.5 standard deviations. The final threshold is calculated within the range of the upper and lower limits. The system also records statistical information for debugging and analysis, including the number of candidate logs, the histogram of correlation distribution, threshold calculation parameters, and the expected filtering ratio.

[0055] Step 106: Select audit logs with information relevance greater than the dynamic filtering threshold as target logs, and select a preset number of target logs as the filtering results in descending order of information relevance.

[0056] Target logs refer to highly relevant audit log records whose information relevance scores exceed the dynamic filtering threshold. Descending order refers to the arrangement of logs from largest to smallest numerical value. The preset quantity is the upper limit for target log selection pre-configured by the system. The filtering result is the final log set after threshold filtering and quantity limits.

[0057] Specifically, the final set of filtered results is generated through threshold filtering and sorting selection algorithms. First, the system performs threshold filtering: it iterates through each log record in the candidate audit log set, obtains its corresponding information relevance score, and compares it with the dynamic filtering threshold. For each log record, a conditional check is performed; if the log relevance score is greater than the dynamic filtering threshold, the log record is added to the target log list, and its relevance score is recorded for subsequent sorting. The system uses a linear scan algorithm to traverse the candidate set: it initializes an empty target log list, sets a counter to 0, iterates through each candidate log, extracts the relevance score of the current log, performs a conditional check; if the relevance score is greater than the threshold, it performs an addition operation, adding the log identifier and relevance score as a tuple to the target log list, and increments the counter by 1. To improve filtering efficiency, the system uses a fast filtering algorithm: it pre-sorts the candidate logs by relevance score, uses a binary search algorithm to quickly locate the first log with a relevance score greater than the threshold, and extracts all subsequent logs from that position as the target log set. Then, a descending sort operation is performed: the quicksort algorithm is used to sort the target log list in descending order of relevance score. The comparison function for sorting is that if the relevance score of the first log is greater than that of the second log, the first log is ranked first. Specifically, a merge sort algorithm is used to ensure sorting stability: the target log list is divided into several sublists, each sublist is sorted, and then the sorted sublists are merged. During the merging process, the relevance scores of the head elements of the two sublists are compared, and the one with the larger score is placed in the result list.

[0058] Next, the selection process is limited by quantity: a preset number of records are selected from the sorted target log list as the final filtering results. The system first checks if the total number of target logs exceeds the preset limit. If the total number of target logs is less than or equal to the preset limit, all target logs are returned directly. If the total number of target logs exceeds the preset limit, the first preset number of target logs are truncated. To ensure result quality, the system sets a minimum filtering threshold: if the number of filtered target logs is less than the minimum threshold, the dynamic filtering threshold is lowered and the filtering process is re-executed. The threshold adjustment formula is: the new threshold equals the original threshold minus 0.1 times the standard deviation. Finally, a set of filtered results is generated, containing complete record information, relevance score, sorting position, and selection reason for each target log. It also records filtering statistics including the original number of candidates, the threshold-filtered number, the final selected number, and the average relevance score.

[0059] Step 107: Arrange the screening results by timestamp to form an audit trail chain, and output a structured screening report containing the audit trail chain.

[0060] Timestamp sorting refers to the process of sequentially ordering operations according to their recorded timestamps. An audit trail chain is a chronologically linked sequence of related audit events used to trace complete operational processes. A structured screening report is a formatted output document containing audit trail chains, statistical information, and analysis results.

[0061] Specifically, a complete audit analysis output is constructed through time series sorting and report generation algorithms. First, the system performs timestamp extraction and sorting operations on the filtered results: it iterates through each target log in the filtered result set, extracts the operation timestamp field from the log record, and converts the timestamp into a standard format numerical time representation. The system uses timestamp parsing functions to process time data in different formats; for standard timestamp formats, it directly converts them to integer values ​​with second-level precision; for date-time string formats, it uses a time parsing library to convert them into timestamp values. Then, a time sorting algorithm is executed: a stable sorting algorithm is used to arrange the target logs in ascending order by timestamp, ensuring that logs with the same timestamp maintain their original relative order. Next, an audit trajectory chain is constructed: the sorted log sequence is converted into a chain structure, with each node containing a log record, timestamp, predecessor node reference, and successor node reference. The system calculates the time interval between adjacent logs: for each pair of adjacent log records, the time difference is calculated as the difference between the timestamp of the next log record and the timestamp of the previous log record, identifying locations with abnormal time intervals as potential trajectory breakpoints. Finally, a structured filtering report is generated: a multi-level report structure is created, including an execution summary, audit trajectory chain, detailed analysis, and appendix information. The report's execution summary includes filtering statistics: total number of original audit events, number of candidate logs, final number of filtered events, average correlation score, time span, and main risk types. The audit trail chain section displays detailed information for each event in chronological order: event identifier, operation time, operation entity, operation object, operation type, correlation score, and risk level. The system uses a template engine to generate formatted reports. The defined report template includes titles, tables, charts, and text paragraphs; audit data is then populated into the template to generate the final structured document.

[0062] In one possible implementation, the rate of curvature change between adjacent sampling points is calculated based on three-dimensional coordinate data, specifically including steps 1071-1073, as follows: Step 1071: Sort the target logs in the filtering results according to the timestamp in time order to generate a log sequence; identify the association between the operation subject identifier and operation object identifier between adjacent target logs in the log sequence to determine the continuous operation characteristics.

[0063] Time-order sorting is the process of arranging log records in chronological order from earliest to latest timestamps. A log sequence is an ordered collection of log records formed after time sorting. The operation subject identifier is a unique identifier for the user, process, or system entity performing the operation. The operation object identifier is a unique identifier for the target entity such as the file, database, or network resource being operated on. Continuous operation characteristics refer to the inherited association between adjacent log records in terms of operation subject or operation object.

[0064] Specifically, a log sequence with continuous characteristics is generated through time-based sorting and correlation identification algorithms. First, the system performs timestamp sorting on the target logs in the filtered results: extracting the timestamp field of each target log, and using a quicksort algorithm to sort them in ascending order by timestamp value, generating a log sequence arranged chronologically. A stable sorting strategy is employed to ensure that logs with the same timestamp maintain their original relative positions. Then, the system identifies the correlation between adjacent logs: traversing the sorted log sequence, it analyzes the correlation between the operation subject and the operation object for each pair of adjacent target log records. The system extracts the operation subject identifier (previous subject) and operation object identifier (previous object) of the previous log, and the operation subject identifier (rear subject) and operation object identifier (rear object) of the next log, calculating the correlation type: subject continuity correlation determines whether the previous subject equals the subsequent subject; object continuity correlation determines whether the previous object equals the subsequent object; and cross-correlation determines whether the previous subject equals the subsequent object or vice versa. Finally, the continuous operation characteristics are determined: a continuity strength value is calculated based on the correlation type and time interval. The time interval is calculated as follows: the time difference equals the timestamp of the next log entry minus the timestamp of the previous log entry. A time threshold of 600 seconds is set; time is considered continuous when the time difference is less than or equal to the threshold. Continuity strength is calculated using a weighted scoring method: subject continuity weight is 0.4, object continuity weight is 0.3, cross-association weight is 0.2, and time continuity weight is 0.1. Continuity strength equals the sum of the products of each weight and its corresponding Boolean value. A continuity strength greater than 0.5 is marked as a strong continuity feature; a continuity strength between 0.2 and 0.5 is marked as a weak continuity feature; and a continuity strength less than 0.2 is marked as a non-continuous feature.

[0065] Step 1072: Divide the log sequence into several operation segments based on the continuous operation characteristics, wherein each operation segment contains target logs that are temporally continuous and operation-related.

[0066] Operation segments are groups of log records in a log sequence that possess both temporal continuity and operational correlation. Partitioning is the process of dividing a complete log sequence into several independent segments based on continuous operational characteristics. Temporal continuity refers to the close adjacency of log records along the time dimension. Operational correlation refers to the logical relationship between log records in terms of the operation subject or object.

[0067] Specifically, the system achieves segmentation of log sequences through continuity feature analysis and sequence segmentation algorithms. First, the system performs sequence segmentation based on continuous operation features: it iterates through each pair of adjacent logs in the sequence, checks their continuity strength value, and inserts a segmentation point at the position when the continuity strength is less than 0.2, dividing the log sequence into multiple subsequences. The segmentation algorithm uses a sliding window method: setting the window size to 2, the window slides from the beginning of the sequence, calculating the continuity strength of two logs within the window, and recording the segmentation position when the strength value is lower than the segmentation threshold. Then, segment boundary determination is performed: based on the segmentation point positions, the log sequence is divided into several operation segments. The starting boundary of each segment is the position after the previous segmentation point, and the ending boundary is the position of the current segmentation point. To ensure the integrity of the segments, the system performs boundary adjustment: it checks the first and last logs of each segment to verify its operation integrity; when there are isolated operations at the beginning or end of a segment, it performs boundary fine-tuning, merging isolated operations into adjacent segments or forming independent single-log segment segments. Next, the chain quality is evaluated: the cohesion of each operation chain is calculated. Cohesion is equal to the average continuity strength of all adjacent logs within the chain. A minimum cohesion threshold of 0.3 is set. When the cohesion of a chain is lower than the threshold, a secondary segmentation is performed, further subdividing the low-cohesion chain into multiple sub-chains. The system also calculates the time span and operation density of the chain: the time span is equal to the timestamp of the last log in the chain minus the timestamp of the first log, and the operation density is equal to the number of logs in the chain divided by the time span, used to evaluate the degree of operation concentration in the chain. Finally, several operation chains are generated, each containing a set of target logs that are temporally continuous and operationally related.

[0068] Step 1073: Add a segment identifier and associated pointers to the preceding and following segments for each operation segment, and connect the operation segments in chronological order to generate an audit trajectory chain.

[0069] A chain segment identifier is a unique identifier assigned to each operation chain segment. An associated pointer refers to the address of the preceding and following chain segments. Chaining is the process of connecting independent operation chains in chronological order into a complete chain structure. An audit trail chain is a complete audit event tracing chain formed by connecting multiple operation chains in chronological order.

[0070] Specifically, a complete audit trail chain structure is constructed through identifier allocation and pointer linking algorithms. First, the system assigns a unique segment identifier to each operation segment: an identifier is generated using an incrementing sequence number. The segment identifier format is a prefix plus a timestamp plus a sequence number; for example, the segment identifier equals the audit chain plus the current timestamp plus the sequence number, ensuring the uniqueness and readability of the identifier. The identifier allocation process follows the order in which the segment appears in the log sequence: the first segment is assigned sequence number 1, the second segment is assigned sequence number 2, and so on. Then, an associated pointer is added to each operation segment: a segment structure is created containing the segment identifier, a log list, a predecessor pointer, a successor pointer, and metadata information. The predecessor pointer points to the immediately preceding segment in time, and the successor pointer points to the immediately following segment in time. For the first segment, its predecessor pointer is null, and for the last segment, its successor pointer is null. The pointer assignment process uses a doubly linked list construction algorithm: traversing all operation segments, the successor pointer of the current segment is assigned the reference address of the next segment, and the predecessor pointer of the next segment is assigned the reference address of the current segment. Next, the chain concatenation operation is performed: Each operation chain segment is connected into a complete chain structure in chronological order, verifying the temporal continuity and logical correlation between the segments. During the concatenation process, the chain segment interval time is calculated: for each pair of adjacent segments, the time difference between the start time of the subsequent segment and the end time of the preceding segment is calculated. When the time difference exceeds a preset threshold of 3600 seconds, it is marked as a time breakpoint, and a time interval marker is inserted into the audit trajectory chain. Finally, a complete audit trajectory chain is generated: all chain segments and related information are integrated to construct a composite data structure containing chain segment sequences, a timeline, a relational graph, and metadata, supporting forward and reverse traversal operations and providing fast location and relational query functions.

[0071] In the above embodiments, a basic correlation analysis framework for audit data was implemented through audit event feature vector construction, candidate log retrieval, and dynamic threshold filtering. To further improve the retrieval efficiency of large-scale distributed audit data and reduce the impact of index construction complexity on query performance, this application also provides another efficient audit data retrieval method based on multi-dimensional index fusion. This method intelligently optimizes the retrieval strategy by constructing a multi-layered index structure with time, entity, and behavior dimensions and analyzing the collaborative relationship between Bloom filters and hash tables, enabling the system to more accurately handle the retrieval needs of massive audit data and heterogeneous log formats. The following section combines... Figure 2 Another blockchain audit information filtering method is described in the embodiments of this application: Please see Figure 2 This is another flowchart illustrating a blockchain audit information screening method in this application embodiment.

[0072] Step 201: Divide the operation timestamps into segments according to the preset time granularity, and create an index entry for each time segment containing all audit log identifiers within that time segment, forming the first index.

[0073] The preset time granularity refers to dividing a continuous time axis into fixed-length time intervals. A time period is an independent time interval after being divided according to the time granularity. An index entry is a data structure containing all audit log identifiers within a specific time period. The first index is a fast retrieval structure built based on the time dimension.

[0074] Specifically, a fast time-based retrieval structure is achieved through time segmentation algorithms and index entry construction. First, the system performs time axis segmentation according to a preset time granularity: setting the time granularity to granularity seconds, for any operation timestamp, the start time of its time segment (segment_start) is calculated as the integer part of timestamp divided by granularity multiplied by granularity, and the end time (segment_end) is calculated as segment_start plus granularity. The system uses time segment identifiers as index keys: the time segment identifier (segment_id) is equal to the integer part of timestamp divided by granularity, ensuring that all timestamps within the same time segment are mapped to the same segment identifier. Then, all audit log records are traversed, extracting the operation timestamp and log identifier for each log entry: parsing the timestamp field from the log record, verifying the validity of the timestamp format, and converting the timestamp to a standard second-level precision value. Next, the segmentation mapping operation is performed: a corresponding index entry data structure is created for each time segment identifier, containing the time segment identifier, start time, end time, and a list of log identifiers. The specific implementation uses a hash table structure to store the time period index: a hash table `time_index` is created, with the time period identifier as the key and the log identifier list as the value. For each audit log, its time period identifier is calculated, and the existence of a corresponding index entry in the hash table is checked. If it does not exist, a new index entry is created and an empty log identifier list is initialized. The identifier of the current log is added to the log list of the corresponding time period. To improve index efficiency, the system adopts a compression storage strategy: for time periods containing a large number of logs, bitmap compression technology is used to store log identifiers, representing a continuous range of identifiers as a combination of a start value and a length. Finally, the first index is generated, supporting fast query operations based on time ranges. The query complexity is O(1) hash table access plus O(k) log list traversal, where k is the number of logs within the time period.

[0075] Step 202: Use a hash mapping structure to store the mapping relationship between the operation subject identifier and the operation object identifier, and build an association list based on all operation records associated with the mapping relationship to form a second index.

[0076] A hash mapping structure is a data structure that uses a hash function to establish a key-value mapping. The operation subject identifier is a unique identifier for the entity performing the operation. The operation object identifier is a unique identifier for the target entity being operated on. The mapping relationship is the association between the subject identifier and the object identifier. The association list is a collection of related operation records built based on the mapping relationship. The second index is a fast retrieval structure built based on the entity association dimension.

[0077] Specifically, a fast retrieval structure based on entity association is achieved through hash mapping and association list generation. First, the system creates a bidirectional hash mapping structure to store subject-object relationships: a subject-to-object mapping table `subject_to_object_map` is established, using the operating subject identifier as the key and a list of all object identifiers operated on by that subject as the value; an object-to-subject mapping table `object_to_subject_map` is established, using the operating object identifier as the key and a list of all subject identifiers operated on by that object as the value. Then, all audit log records are traversed, extracting the operating subject identifier, operating object identifier, and operation record identifier for each log entry: the subject and object fields are parsed from the log records, and the subject and object identifiers are standardized to remove format differences and redundant information. Next, the mapping relationship is established: for each audit log entry, it checks if an entry for the current subject identifier exists in the subject-to-object mapping table; if not, a new entry is created and an empty object list is initialized, adding the current object identifier to the object list of that subject; simultaneously, it checks if an entry for the current object identifier exists in the object-to-subject mapping table; if not, a new entry is created and an empty subject list is initialized, adding the current subject identifier to the subject list of that object. Then, an association list is constructed based on the mapping relationship: a list of associated operation records is created for each subject-object relationship pair. The key of the association list is a combination of the subject identifier and the object identifier, and the value is a list of identifiers of all operation records involving that relationship. Specifically, a composite key hash table is used: the composite key format is subject identifier plus separator plus object identifier, for example, user123_file456. The composite key is used to look up or create the corresponding list of operation records in the association list hash table. To optimize storage efficiency, the system uses a compressed index for high-frequency association relationships: the operation frequency of each subject-object relationship is counted, and for relationships with more than 100 operations, a tree-like index structure is used instead of a linear list to improve the access performance of large-scale associated data. Finally, a second index is generated to support fast association query operations based on subject identifier, object identifier, or a combination of subject and object identifiers.

[0078] Step 203: Classify the operation behavior types into data access type, permission change type and system configuration type, and create an index entry for each classified type containing all operation record identifiers under the type to form a third index.

[0079] Operation behavior type refers to the classification identifier of the operation actions recorded in the audit log. Data access category refers to operations involving data reading, querying, downloading, etc. Permission change category refers to operations involving permission management, such as modifying user permissions, assigning roles, and changing access controls. System configuration category refers to operations involving system management, such as setting system parameters, configuring services, and changing the environment. The third index is a classification retrieval structure built based on the operation behavior dimension.

[0080] Specifically, a fast retrieval structure based on the operational behavior dimension is achieved through behavior category mapping and category index construction. First, the system establishes category mapping rules for operational behavior types: the data access category includes operation codes ranging from 100 to 199, covering behaviors such as file reading, database querying, web page access, and downloading; the permission change category includes operation codes ranging from 200 to 299, covering behaviors such as user login, privilege escalation, role assignment, and access control modification; and the system configuration category includes operation codes ranging from 300 to 399, covering behaviors such as system settings, service start / stop, configuration file modification, and environment variable updates. The system uses a category mapping table `behavior_category_map` to store the correspondence between operation codes and categories, with the key being the operational behavior type code and the value being the corresponding category identifier. Then, all audit log records are traversed, and the operational behavior type field of each log is extracted: the behavior type code is parsed from the log record, the validity of the code format is verified, and the string-formatted type code is converted into a numeric identifier. Next, the behavior classification operation is performed: For each audit log, the corresponding category identifier is looked up in the category mapping table based on its operation behavior type code. If the operation code is in the range of 100 to 199, it is classified as a data access category; if the operation code is in the range of 200 to 299, it is classified as a permission change category; if the operation code is in the range of 300 to 399, it is classified as a system configuration category; operation codes outside the predefined range are classified as other categories. Then, an index entry is created for each category: a category index hash table `category_index` is created, with the category identifier as the key and the list of all operation record identifiers under that category as the value. The specific implementation process is to traverse the audit logs after each category, obtain its category identifier and operation record identifier, check whether there is an index entry for the corresponding category in the category index table, if not, create a new index entry and initialize an empty record list, and add the current operation record identifier to the record list of the corresponding category. To support more granular behavior retrieval, the system also creates sub-category indexes: based on the main category, the operation type is further subdivided, for example, the data access category is subdivided into file access, database access, network access, etc., and an independent index entry is created for each sub-category. Finally, a third index is generated, which supports fast filtering and statistical analysis operations based on operation behavior classification.

[0081] Step 204: Associate and map the first index, the second index, and the third index to construct an audit data index structure that reflects the time dimension, entity dimension, and behavior dimension.

[0082] Association mapping refers to the process of establishing connections between multiple independent index structures through cross-referencing. The time dimension refers to an index dimension with time as the primary search condition. The entity dimension refers to an index dimension with the main operating entity and object relationship as the primary search condition. The behavior dimension refers to an index dimension with the type of operation behavior as the primary search condition. The audit data index structure is a composite search architecture that integrates multiple dimension indexes.

[0083] Specifically, a unified audit data retrieval architecture is constructed through multi-dimensional index fusion and cross-mapping algorithms. First, the system creates cross-mapping tables between dimensions: a time entity mapping table (time_entity_map) records the correspondence between each time period and the related entity objects; a time behavior mapping table (time_behavior_map) records the correspondence between each time period and the type of operation behavior that occurred; and an entity behavior mapping table (entity_behavior_map) records the correspondence between each entity object relationship and the relevant operation behavior type. Then, index association operations are performed: all audit log records are traversed, and the time period identifier, entity object relationship identifier, and operation behavior category identifier of each log entry are extracted. A bidirectional association between time periods and entity object relationships is established in the time entity mapping table, a bidirectional association between time periods and operation behaviors is established in the time behavior mapping table, and a bidirectional association between entity object relationships and operation behaviors is established in the entity behavior mapping table. Next, a composite index key is constructed: the three-dimensional composite key format is defined as a time period identifier plus a subject object relationship identifier plus an operation behavior category identifier. Separators are used to connect the dimension identifiers to form a unique composite key, for example, the time period 1698753600 plus the subject object user123_file456 plus the behavior category permission change class. The system creates a composite index hash table `composite_index`, using the three-dimensional composite key as the key and a list of operation record identifiers that satisfy the combination conditions as the value. Then, a dimension weight mapping is established: weight values ​​are assigned to different dimensions based on query frequency statistics, with a weight of 0.4 for the time dimension, 0.35 for the entity dimension, and 0.25 for the behavior dimension, to optimize the execution order of multi-dimensional queries. Finally, a query routing table is constructed: based on the combination of dimensions in the query conditions, the optimal index access strategy is formulated. Single-dimensional queries directly access the corresponding basic index, two-dimensional queries use a cross-mapping table to perform result intersection operations, and three-dimensional queries directly access the composite index to obtain accurate results. The system also establishes index statistics: recording the data distribution characteristics, query frequency statistics, and performance indicators for each dimension, used to dynamically adjust the index structure and optimize query strategies. The final result is an audit data index structure that reflects the time, entity, and behavior dimensions, supporting various query modes such as single-dimensional queries, multi-dimensional combined queries, and full-text search.

[0084] In one possible implementation, the first index, the second index, and the third index are associated and mapped to construct an audit data index structure that reflects the time dimension, entity dimension, and behavior dimension. Specifically, this includes steps 2041-2043, as follows: Step 2041: Create Bloom filters corresponding to the first index, the second index, and the third index respectively, and construct a Bloom filter combination structure based on each Bloom filter.

[0085] A Bloom filter is a highly space-efficient probabilistic data structure used to quickly determine whether an element exists in a set. A composite Bloom filter structure is a combined filtering structure formed by logically combining multiple independent Bloom filters.

[0086] Specifically, an efficient existence detection mechanism is achieved through Bloom filter construction algorithms and combined structure design. First, the system creates a corresponding Bloom filter for each index: the required bit array size *m* is calculated as a negative number *n* multiplied by *ln(false positive rate)* divided by the square of *ln2*, and the optimal number of hash functions *k* is calculated as *m* divided by *n* multiplied by *ln2*, where *n* is the expected number of elements. A time-dimensional Bloom filter is created for the first index, iterating through all time period identifiers and adding them to the bit array using *k* hash functions. An entity-relationship-dimensional Bloom filter is created for the second index, processing all entity object relationship identifiers. A behavior-classification-dimensional Bloom filter is created for the third index, processing all operation behavior type identifiers. Then, a combined Bloom filter structure is constructed: a combined data structure containing three independent Bloom filter instances is created, supporting single-dimensional queries, two-dimensional combined queries, and three-dimensional joint queries. A fast failure mechanism is implemented: in multi-dimensional queries, if any dimension returns a non-existent result, the query immediately fails.

[0087] Step 2042: Use the hash value of the operation timestamp, the combined hash value of the operation subject identifier and the operation object identifier, and the hash value of the operation behavior type as composite keys to build an audit data hash table.

[0088] A hash value is the result of mapping data of arbitrary length to a fixed-length numerical value using a hash function. A composite hash value is a combined hash value calculated by concatenating multiple identifiers. A composite key is a unique identifier formed by combining multiple hash values. An audit data hash table is a fast lookup data structure indexed by composite keys.

[0089] Specifically, an efficient audit data lookup structure is established through multi-dimensional hash calculation and composite key generation algorithms. First, the system performs hash calculations on three dimensions: the hash value of the operation timestamp is calculated using SHA256 or MurmurHash algorithms; the combined hash value is calculated by concatenating the subject identifier and object identifier; and the hash value of the operation behavior type is calculated. To handle the bidirectional nature of the subject-object relationship, both forward and reverse combination hash values ​​are calculated simultaneously, and the smaller value is selected. Then, a composite key is generated: the timestamp hash value, combined hash value, and behavior type hash value are concatenated in a fixed order using a separator to form a unique composite key identifier. Finally, an audit data hash table is established: a hash table is created with the composite key as the key and the audit log record as the value, using open addressing to resolve collisions, setting the load factor to 0.75, and supporting dynamic expansion.

[0090] Step 2043: Cross-map the Bloom filter combination structure with the audit data hash table to generate a distributed audit data index structure.

[0091] Cross-mapping refers to the process of establishing a correspondence between two different data structures through association. A distributed index structure is a distributed data organization method that stores index data across multiple nodes.

[0092] Specifically, a scalable audit data index architecture is constructed using distributed mapping algorithms and load balancing strategies. First, the system establishes a mapping relationship between Bloom filters and hash tables: a two-stage detection mechanism is employed. The first stage uses a Bloom filter for fast existence pre-detection, and the second stage uses a hash table for precise data retrieval. During query processing, the system first checks the Bloom filter; if it returns "not found," an empty result is returned directly; if it returns "existent," a precise query is performed in the hash table. Then, a distributed index sharding strategy is designed: a consistent hashing algorithm is used to map data to different nodes based on the composite key hash value, with the shard identifier equal to the composite key hash value modulo the total number of shards. A sharding routing table is created to record the node address and data range of each shard. A query coordinator is implemented to handle cross-shard queries, sending requests to relevant shards in parallel and merging results. A master-slave replication mechanism is established to ensure data consistency, and an LRU cache is used to improve the performance of hotspot queries.

[0093] In the above embodiments, a basic retrieval and control framework for audit data is implemented through time-dimensional indexing, entity relationship mapping, and behavior classification indexing. To further enhance the anomaly detection capability in complex audit scenarios and reduce the impact of operational behavior changes on security monitoring, this application also provides another behavior-aware adaptive anomaly detection and control method. This method intelligently adjusts the detection strategy by identifying behavioral pattern changes in operational chains and analyzing the frequency-anomaly coupling relationship, enabling the system to more accurately handle the anomaly detection needs of complex multi-dimensional audit trajectories and heterogeneous operation combinations.

[0094] Step 301: Calculate the anomaly degree of the corresponding operation chain segment based on the operation behavior type and operation frequency of each operation chain segment in the audit trajectory chain.

[0095] An operation chain segment is a continuous and interconnected sequence of operations within an audit trajectory chain. Operation behavior type is a classification label describing the nature of the operation action. Operation frequency refers to the statistical count of the number of times an operation behavior occurs within a specific time period. Anomaly degree is a numerical indicator that quantifies the degree to which an operation chain segment deviates from the normal behavior pattern.

[0096] Specifically, the abnormality level of operation chains is quantified through behavioral pattern analysis and statistical anomaly detection algorithms. First, the system extracts behavioral characteristics of the operation chains: it traverses all operation chains in the audit trajectory chain, extracting the distribution of operation behavior types and operation frequency statistics for each chain. The operation behavior type distribution is calculated by statistically analyzing the proportion of each type of operation within the chain, forming a behavior type vector; for example, data access accounts for 60%, permission changes for 30%, and system configuration for 10%. Operation frequency statistics are calculated by determining the time density of operations within the chain; the frequency value equals the total number of operations within the chain divided by the chain's time span. Then, a normal behavior baseline model is established: historical normal operation data is collected to construct a baseline statistical model, calculating the mean μ and standard deviation σ of the normal frequency for each operation behavior type. The baseline model includes frequency distribution parameters for different behavior types; for example, the mean normal frequency for data access is 2.5 times per minute, and the standard deviation is 0.8 times per minute. Next, anomaly calculation is performed: the Z-score standardization method is used to calculate the deviation of operation frequency; the Z-value equals the current frequency minus the baseline mean divided by the baseline standard deviation. The deviation of behavior type is calculated by comparing the cosine similarity between the behavior type distribution of the current chain segment and the baseline distribution; a smaller similarity value indicates a greater degree of deviation. The comprehensive anomaly score is calculated as follows: anomaly score equals the frequency deviation weight multiplied by the absolute value of the frequency Z-score, plus the behavior deviation weight multiplied by 1, minus the behavior similarity. The weight parameters are determined through machine learning training. To improve the accuracy of anomaly detection, the system employs a sliding window technique: a time window size of 30 minutes and a window step size of 5 minutes are set. Anomaly scores are calculated separately for each operation chain segment within each time window, and the final anomaly score is obtained by a weighted average of the anomaly scores from multiple windows.

[0097] Step 302: Mark operation chains with an abnormality level exceeding the preset abnormality threshold as abnormal operation chains; retrieve relevant operation records of abnormal operation chains within the preset time range from the original audit log to form abnormal behavior context information; integrate the abnormal behavior context information to generate an abnormal warning report.

[0098] The preset anomaly threshold is a numerical critical point for determining whether an operation chain segment is abnormal. An abnormal operation chain segment is a suspicious sequence of operations whose anomaly level exceeds the threshold. The preset time range is the time boundary for retrieving relevant operation records. The abnormal behavior context information is the complete operational environment and background data surrounding the abnormal operation. The anomaly warning report is a structured alert document formed by integrating anomaly information.

[0099] Specifically, a complete anomaly warning mechanism is generated through threshold judgment and context information collection algorithms. First, the system performs anomaly chain identification: it traverses all operation chains with calculated anomaly scores, comparing the anomaly score with a preset anomaly threshold. The anomaly threshold is determined based on historical data statistics, typically set as the 95th percentile of the normal anomaly score distribution. For operation chains with anomalies exceeding the threshold, the system marks them as anomalous operation chains and records the anomaly time, anomaly type, and anomaly severity. Then, it performs related operation record retrieval: it determines the time boundaries of the anomalous operation chains, calculating the start time of the retrieval time range as the start time of the anomalous chain minus half of the preset time range, and the end time as the end time of the anomalous chain plus half of the preset time range. Related operation records are retrieved using time range filtering conditions in the original audit logs. Simultaneously, it performs association filtering based on the operation subject and object involved in the anomalous chain to ensure that the retrieved records have a direct or indirect correlation with the anomalous operation. Next, it constructs the anomaly behavior context information: it arranges the retrieved related operation records in chronological order and analyzes the causal and dependency relationships of the operation sequences. The contextual information includes the preceding, subsequent, concurrent, and related operations of the abnormal operation, forming a complete operation chain diagram. The system also analyzes the impact scope of the abnormal operation, statistically analyzing the number of affected data objects, system resources, and user accounts. Finally, an anomaly warning report is generated: a structured report template is created, including four parts: anomaly summary, detailed analysis, impact assessment, and handling recommendations. The anomaly summary records the time of occurrence, anomaly type, anomaly severity, and the entities involved in the operation. The detailed analysis includes the abnormal operation sequence, behavioral pattern deviation analysis, and statistical data comparison. The impact assessment analyzes the potential risks and actual impact scope of the abnormal operation. The handling recommendations provide corresponding emergency response measures and subsequent monitoring suggestions based on the anomaly type and severity.

[0100] Reference Figure 3 This application provides a blockchain audit information filtering system, which includes: a log acquisition module, an index structure generation module, a filtering threshold determination module, and a filtering report determination module, wherein: The log acquisition module is used to acquire the original audit logs in the blockchain network. The original audit logs include the operation subject identifier, the operation object identifier, the operation behavior type, and the operation timestamp. The index structure generation module is used to build a first index based on the operation timestamp, a second index based on the operation subject identifier and the operation object identifier, and a third index based on the operation behavior type, and generate the audit data index structure of the first index, the second index and the third index; The filtering threshold determination module is used to receive security event description information input by the user and generate audit event feature vectors based on the security event description information; retrieve candidate audit log sets from the audit data index structure based on the audit event feature vectors and calculate the information correlation degree between the audit event feature vectors and each log in the candidate audit log set; obtain the mean and standard deviation of the information correlation degree in the candidate audit log set, and determine the dynamic filtering threshold based on the mean and standard deviation. The filtering report determination module is used to filter audit logs with information relevance greater than the dynamic filtering threshold as target logs, and select a preset number of target logs as filtering results in descending order of information relevance; arrange the filtering results by timestamp to form an audit trajectory chain, and output a structured filtering report containing the audit trajectory chain.

[0101] Based on the above embodiments, the index structure generation module is further used to segment the operation timestamps according to a preset time granularity, establish an index entry containing all audit log identifiers within each time period to form a first index; use a hash mapping structure to store the mapping relationship between the operation subject identifier and the operation object identifier, and establish an association list based on all operation records associated with the mapping relationship to form a second index; classify the operation behavior type according to data access type, permission change type and system configuration type, and establish an index entry containing all operation record identifiers under the type for each classified type to form a third index; and associate and map the first index, the second index and the third index to construct an audit data index structure that reflects the time dimension, entity dimension and behavior dimension.

[0102] Based on the above embodiments, the index structure generation module is also used to create Bloom filters corresponding to the first index, the second index, and the third index respectively, and to construct a Bloom filter combination structure according to each Bloom filter; to establish an audit data hash table using the hash value of the operation timestamp, the combined hash value of the operation subject identifier and the operation object identifier, and the hash value of the operation behavior type as composite keys; and to cross-map the Bloom filter combination structure with the audit data hash table to generate a distributed audit data index structure.

[0103] Based on the above embodiments, the screening threshold determination module is also used to perform lexical analysis and recognition on the security event description information to obtain event type keywords, entity identifier keywords, time range keywords, and impact range keywords; map the event type keywords to event type codes, convert the entity identifier keywords into standardized entity identifiers, parse the time range keywords into time intervals, and map the impact range keywords to impact object type codes; and perform structured encapsulation of the event type codes, standardized entity identifiers, time intervals, and impact object type codes to obtain the audit event feature vector.

[0104] Based on the above embodiments, the screening threshold determination module is further used to decompose the audit event feature vector into time feature components, subject relationship feature components, and behavioral feature components; determine the time window range in the first index based on the time feature components, and filter out the first candidate log subset that meets the preset time constraints according to the time window range; perform subject-object association matching in the second index based on the subject relationship feature components, and filter out the second candidate log subset that meets the preset subject relationship constraints from the first candidate log subset; perform behavioral pattern matching in the third index based on the behavioral feature components, and filter out the third candidate log subset that meets the preset behavioral feature constraints from the second candidate log subset; perform deduplication and integrity verification on the third candidate log subset to generate a candidate audit log set.

[0105] Based on the above embodiments, the screening report determination module is further used to sort the target logs in the screening results according to the timestamp in chronological order to generate a log sequence; identify the association relationship between the operation subject identifier and the operation object identifier between adjacent target logs in the log sequence to determine the continuous operation characteristics; divide the log sequence into several operation segments based on the continuous operation characteristics, wherein each operation segment contains target logs that are time-continuous and operation-related; add a segment identifier and the association pointer of the preceding and following segments to each operation segment, and connect the operation segments in chronological order to generate an audit trajectory chain.

[0106] Based on the above embodiments, the screening report determination module is also used to calculate the abnormality of the corresponding operation chain segment based on the operation behavior type and operation frequency of each operation chain segment in the audit trajectory chain; mark the operation chain segment whose abnormality exceeds the preset abnormality threshold as an abnormal operation chain segment; retrieve the relevant operation records of the abnormal operation chain segment within the preset time range from the original audit log to form abnormal behavior context information; and integrate the abnormal behavior context information to generate an abnormal warning report.

[0107] It should be noted that the above embodiments of the apparatus are only illustrated by the division of the above functional modules. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided above belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.

[0108] This application also discloses an electronic device. (See reference...) Figure 4 , Figure 4 This is a schematic diagram of the structure of an electronic device disclosed in an embodiment of this application. The electronic device 400 may include: at least one processor 401, at least one network interface 404, a user interface 403, a memory 405, and at least one communication bus 402.

[0109] The communication bus 402 is used to enable communication between these components.

[0110] The user interface 403 may include a display interface and a camera interface. Optionally, the user interface 403 may also include a standard wired interface and a wireless interface.

[0111] The network interface 404 may optionally include a standard wired interface or a wireless interface (such as a Wi-Fi interface).

[0112] The processor 401 may include one or more processing cores. The processor 401 connects to various parts of the server using various interfaces and lines, and performs various server functions and processes data by running or executing instructions, programs, code sets, or instruction sets stored in memory 405, and by calling data stored in memory 405. Optionally, the processor 401 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor 401 may integrate one or a combination of several of the following: Central Processing Unit (CPU), Graphics Processing Unit (GPU), and modem. The CPU primarily handles the operating system, user interface graphics, and applications; the GPU is responsible for rendering and drawing the content required for display; and the modem handles wireless communication. It is understood that the modem may also not be integrated into the processor 401 and may be implemented as a separate chip.

[0113] The memory 405 may include random access memory (RAM) or read-only memory. Optionally, the memory 405 may include a non-transitory computer-readable storage medium. The memory 405 may be used to store instructions, programs, code, code sets, or instruction sets. The memory 405 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch function, sound playback function, image playback function, etc.), instructions for implementing the above-described method embodiments, etc.; the data storage area may store data involved in the above-described method embodiments, etc. Optionally, the memory 405 may also be at least one storage device located remotely from the aforementioned processor 401. (Refer to...) Figure 4 The memory 405, which serves as a computer storage medium, may include an operating system, a network communication module, a user interface module, and an application for a blockchain audit information filtering method.

[0114] exist Figure 4 In the illustrated electronic device 400, the user interface 403 is mainly used to provide an input interface for the user and to obtain user input data; while the processor 401 can be used to call an application stored in the memory 405 for a blockchain audit information filtering method. When executed by one or more processors 401, the electronic device 400 performs one or more methods as described in the above embodiments. It should be noted that, for the foregoing method embodiments, for the sake of simplicity, they are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, because according to this application, some steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily essential to this application.

[0115] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.

[0116] In the various embodiments provided in this application, it should be understood that the disclosed apparatus can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some service interface; the indirect coupling or communication connection between apparatuses or units may be electrical or other forms.

[0117] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0118] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0119] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage device (CMD). Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a memory and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned memory includes various media capable of storing program code, such as USB flash drives, portable hard drives, magnetic disks, or optical disks.

[0120] The above are merely exemplary embodiments of this disclosure and should not be construed as limiting the scope of this disclosure. Any equivalent changes and modifications made in accordance with the teachings of this disclosure shall still fall within the scope of this disclosure. Other embodiments of this disclosure will be readily apparent to those skilled in the art upon consideration of the specification and practical disclosure.

[0121] This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not described in this disclosure. The specification and embodiments are to be considered exemplary only.< / string>

Claims

1. A method for filtering blockchain audit information, characterized in that, include: Obtain the original audit logs from the blockchain network, wherein the original audit logs include the operator entity identifier, the operator object identifier, the operation behavior type, and the operation timestamp; A first index is constructed based on the operation timestamp, a second index is constructed based on the operation subject identifier and the operation object identifier, and a third index is constructed based on the operation behavior type. An audit data index structure of the first index, the second index and the third index is generated. Receive security event description information input by the user, and generate an audit event feature vector based on the security event description information; Based on the audit event feature vector, a candidate audit log set is retrieved from the audit data index structure, and the information correlation degree between the audit event feature vector and each log in the candidate audit log set is calculated; Obtain the mean and standard deviation of the information correlation in the candidate audit log set, and determine the dynamic screening threshold based on the mean and the standard deviation; Audit logs with information relevance greater than the dynamic filtering threshold are selected as target logs, and a preset number of target logs are selected as the filtering results in descending order of information relevance. The filtered results are arranged by timestamp to form an audit trajectory chain, and a structured filtering report containing the audit trajectory chain is output.

2. The method according to claim 1, characterized in that, The process of constructing a first index based on the operation timestamp, a second index based on the operation subject identifier and the operation object identifier, and a third index based on the operation behavior type, and generating an audit data index structure of the first index, the second index, and the third index, includes: The operation timestamps are segmented according to a preset time granularity, and an index entry containing all audit log identifiers within each time period is created to form the first index; A hash mapping structure is used to store the mapping relationship between the operation subject identifier and the operation object identifier, and an association list is established based on all operation records associated with the mapping relationship to form the second index; The operation behavior types are classified into data access type, permission change type and system configuration type, and an index entry containing all operation record identifiers under the classified type is created for each type to form the third index; The first index, the second index, and the third index are associated and mapped to construct an audit data index structure that reflects the time dimension, entity dimension, and behavior dimension.

3. The method according to claim 2, characterized in that, The step of associating and mapping the first index, the second index, and the third index to construct an audit data index structure reflecting the time dimension, entity dimension, and behavior dimension includes: Create Bloom filters corresponding to the first index, the second index, and the third index respectively, and construct a Bloom filter combination structure based on each Bloom filter; An audit data hash table is established using the hash value of the operation timestamp, the combined hash value of the operation subject identifier and the operation object identifier, and the hash value of the operation behavior type as composite keys; The Bloom filter combination structure is cross-mapped with the audit data hash table to generate a distributed audit data index structure.

4. The method according to claim 1, characterized in that, The step of generating an audit event feature vector based on the security event description information includes: Lexical analysis is performed on the security event description information to obtain event type keywords, entity identifier keywords, time range keywords, and impact range keywords; The event type keywords are mapped to event type codes, the entity identifier keywords are converted into standardized entity identifiers, the time range keywords are parsed into time intervals, and the scope of influence keywords are mapped to the type of influence object. The event type code, the standardized entity identifier, the time interval, and the affected object type code are structurally encapsulated to obtain the audit event feature vector.

5. The method according to claim 1, characterized in that, The step of retrieving a set of candidate audit logs from the audit data index structure based on the audit event feature vector includes: The audit event feature vector is decomposed into time feature components, subject relationship feature components, and behavioral feature components; Based on the time feature components, a time window range is determined in the first index, and a first candidate log subset that meets the preset time constraints is selected according to the time window range. Based on the subject relationship feature components, subject-object association matching is performed in the second index to filter out a second candidate log subset that meets the preset subject relationship constraints from the first candidate log subset; Based on the behavioral feature components, behavioral pattern matching is performed in the third index to select a third candidate log subset that meets the preset behavioral feature constraints from the second candidate log subset; The third candidate log subset is deduplicated and its integrity is verified to generate the candidate audit log set.

6. The method according to claim 1, characterized in that, The step of arranging the filtered results by timestamp to form an audit trajectory chain includes: The target logs in the filtered results are sorted chronologically according to their timestamps to generate a log sequence. Identify the association between the operation subject identifier and the operation object identifier between adjacent target logs in the log sequence to determine the continuous operation characteristics; Based on the continuous operation characteristics, the log sequence is divided into several operation segments, wherein each operation segment contains target logs that are temporally continuous and operation-related. Add a segment identifier and associated pointers to the preceding and following segments for each operation segment, and concatenate the operation segments in chronological order to generate the audit trajectory chain.

7. The method according to claim 1, characterized in that, The output, after including the structured screening report of the audit trail chain, also includes: Based on the operation behavior type and operation frequency of each operation chain segment in the audit trajectory chain, the abnormality degree of the corresponding operation chain segment is calculated; The operation chain segment whose anomaly exceeds the preset anomaly threshold is marked as an abnormal operation chain segment; Retrieve relevant operation records of the abnormal operation chain within a preset time range from the original audit log to form abnormal behavior context information; The abnormal behavior context information is integrated to generate an abnormal early warning report.

8. A blockchain audit information screening system, characterized in that, The system includes: The log acquisition module is used to acquire the original audit logs in the blockchain network. The original audit logs include the operation subject identifier, the operation object identifier, the operation behavior type, and the operation timestamp. An index structure generation module is used to construct a first index based on the operation timestamp, a second index based on the operation subject identifier and the operation object identifier, a third index based on the operation behavior type, and generate an audit data index structure of the first index, the second index and the third index; The filtering threshold determination module is used to receive security event description information input by the user, and generate an audit event feature vector based on the security event description information; retrieve a candidate audit log set from the audit data index structure based on the audit event feature vector, and calculate the information correlation degree between the audit event feature vector and each log in the candidate audit log set; obtain the mean and standard deviation of the information correlation degree in the candidate audit log set, and determine a dynamic filtering threshold based on the mean and the standard deviation; The filtering report determination module is used to filter audit logs whose information relevance is greater than the dynamic filtering threshold as target logs, and select a preset number of target logs as filtering results in descending order of information relevance; arrange the filtering results by timestamp to form an audit trajectory chain, and output a structured filtering report containing the audit trajectory chain.

9. An electronic device, characterized in that, The device includes a processor, a memory, a user interface, and a network interface. The memory is used to store instructions, the user interface and the network interface are used to communicate with other devices, and the processor is used to execute the instructions stored in the memory to enable the electronic device to perform the blockchain audit information filtering method as described in any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores instructions that, when executed, perform the blockchain audit information filtering method as described in any one of claims 1-7.