A data processing method, device and system for a relational database, and a storage medium
By introducing a combination of finite state machines and policy rule bases into the proxy server, the complexity of state maintenance and data flow in relational database communication is solved, the integrity and consistency of the protocol process are achieved, and the stability and predictability of the system are improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CETC BIGDATA RES INST CO LTD
- Filing Date
- 2026-05-25
- Publication Date
- 2026-06-19
AI Technical Summary
In relational database communication, the proxy layer needs to handle multi-source sessions, multi-type commands, and differentiated strategy requirements. Existing finite state machines have multi-stage and layered characteristics in the data interaction process, which makes the state maintenance logic and data flow mechanism complex and makes it difficult to guarantee the integrity and consistency of the protocol process.
By monitoring data payloads through a finite state machine (FSM) deployed within the proxy server, decoding SQL strings and binding them with session context information, a query context entity is generated. The AST structure and policy rule base are used for pattern mapping to generate a policy judgment entity. Based on the action type identifier domain, state transitions are driven to achieve data pass-through, rewriting, or error response.
It achieves protocol consistency and response predictability in complex concurrent sessions, improves system-level stability and policy-level determinism, eliminates the fragmentation of traditional agents when executing policy branches, and ensures global consistency of data processing and strict consistency of behavioral logic.
Smart Images

Figure CN122241773A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of big data technology, and in particular to a data processing method, apparatus, system and storage medium for relational databases. Background Technology
[0002] Relational databases, as core components of information systems, are widely used in multi-tiered distributed architectures. Clients typically interact with database instances through standard communication protocols to transmit data requests and responses. To ensure the security and consistency of communication paths, proxy layer components are introduced in some application systems to implement functions such as protocol relay, connection management, and access control.
[0003] In the process of database interaction, the proxy layer undertakes basic operations such as receiving, parsing, forwarding and responding to data packets. Internally, it often uses a finite state machine (FSM) model to maintain the orderly switching of communication states, thereby ensuring the stability of the data link and the determinism of the protocol process.
[0004] In real-world deployment scenarios, database communication typically involves multiple source sessions, various command types, and differentiated strategy requirements. The proxy component needs to maintain the integrity of the protocol flow while standardizing the data payload to adapt to different communication contexts and system constraints. Because different database protocols differ in message format, encoding methods, and state transition mechanisms, the proxy layer implementation generally needs to possess a certain degree of flexibility and scalability to handle multi-dimensional command interaction and state transition requirements.
[0005] As database access patterns become more complex, proxy servers, when handling multiple connections and multi-state sessions, increasingly exhibit multi-stage and layered characteristics in their internal state maintenance logic and data flow mechanisms. In this process, finite state machines not only control the flow of instructions but also, to a certain extent, determine the proxy layer's scheduling and response behavior during data interaction. Summary of the Invention
[0006] To address the aforementioned technical problems, this application provides a data processing method, apparatus, system, and storage medium for relational databases.
[0007] The technical solution provided in this application is described below:
[0008] The first aspect of this application provides a data processing method for relational databases, the method comprising: The finite state machine (FSM) deployed within the proxy server listens for data payloads initiated by the client and decodes the first target data unit contained therein to obtain the SQL string according to the pre-negotiation method. The SQL string is semantically bound to the session context information to generate a query context entity, and the SQL text in it is parsed into an AST structure by an SQL syntax parser. Based on the AST structure and the preset policy rule base, a pattern mapping operation is performed to obtain the matching result; A strategy determination entity is generated based on the matching result. The strategy determination entity includes at least an action type identifier field, a rule identification field, an original instruction text field, and a rewritten instruction text field. When the action type identifier field is the first type, the FSM is transferred from the command query state to the waiting result state; Based on the original instruction text field, the corresponding first target data unit is determined from the cached protocol context, and the byte sequence of the first target data unit is transparently transmitted to the target database; When the action type identifier field is the second type, the FSM will be transitioned from the command query state back to the command state; Read the rule recognition field, and based on the reading result, generate an error response packet content by combining it with the original instruction text field, and return it to the client; When the action type identifier field is the third type, the FSM will be transferred from the command query state to the waiting result state; Based on the rewrite instruction text field, a second target data unit is generated and sent to the target database.
[0009] Optionally, a strategy determination entity is generated based on the matching result. The strategy determination entity includes at least an action type identifier field, a rule identification field, an original instruction text field, and a rewritten instruction text field, including: Receive the policy matching output from the pattern mapping operation; Iterate through all matching rules in the output of the strategy; For each hit rule, the rule identifier is read and recorded to form a rule identification domain; Extract the action indication information related to the hit rule and map it to action type, wherein the action type includes a first type, a second type and a third type; Extract the raw SQL string from the query context entity and assign it to the raw instruction text field; Determine whether the instruction needs to be rewritten based on the strategy matching rules and context information; If necessary, generate the rewritten SQL string and assign it to the rewrite instruction text field; The rule identification field, action type identifier field, original instruction text field, and rewritten instruction text field are constructed as a policy determination entity.
[0010] Optionally, determining whether the instruction needs to be rewritten based on the policy matching rules and context information includes: Receive the rule identification field and the original instruction text field from the policy determination entity; Perform a semantic scan on the original instruction text field and identify target instruction nodes involving sensitive tables, controlled columns, or restricted operations; The target instruction node is compared with the policy matching rules, and it is determined whether the rewriting condition is triggered to obtain the first judgment result; Obtain the metadata information from the query context entity; The metadata information is verified against the context constraints in the policy matching rules to determine whether the rewrite triggering condition is met, and a second judgment result is obtained. Based on the first judgment result and the second judgment result, determine whether it is necessary to generate a rewrite instruction; If a rewrite instruction needs to be generated, the rewrite rule set is invoked to transform the original instruction text field, forming a rewritten SQL string, which is then assigned to the rewrite instruction text field of the policy determination entity.
[0011] Optionally, the step of establishing a semantic binding between the SQL string and session context information to generate a query context entity, and parsing the SQL text therein into an AST structure using an SQL syntax parser, includes: Obtain the current session identifier from the FSM and read the corresponding session context information; The session context information is semantically mapped to the SQL string to form a semantic binding pair that contains the correspondence between query statements and environment parameters; Generate a query context entity based on the semantic binding; The SQL string is input into the SQL syntax parser, and the SQL syntax parser identifies the structural information in the SQL string. According to predefined syntax rules, the structural information is constructed into an AST node set, where each node corresponds to an independent syntactic unit in an SQL statement; The set of AST nodes is organized into an AST structure.
[0012] Optionally, the step of performing pattern mapping operations based on the AST structure and a preset policy rule base to obtain matching results includes: Read the AST structure and its associated context meta-information from the query context entity; Traverse each node of the AST structure and extract structural information related to SQL semantics; A syntactic feature vector is generated based on the hierarchical relationship between each node, and the syntactic feature vector is used to characterize the semantic pattern features of the current AST structure. Load a set of rules from a preset policy rule base, the set of rules including at least pattern rules based on string matching and AST pattern rules based on structure matching; The syntax feature vector is compared sequentially with the matching conditions of each rule in the rule set. If the syntax feature vector matches the matching conditions of a rule, then the rule is determined to be matched. Matching results are generated based on the hit rules and their associated information.
[0013] Optionally, the step of reading the rule identification field, generating an error response packet based on the reading result and the original instruction text field, and returning it to the client includes: The unique identifier of the rule identification domain is parsed from the policy determination entity, and the corresponding rule metadata is retrieved from the policy rule base; Extract anomaly classification information, error code templates, and response text templates from the rule metadata; Key semantic fragments are extracted based on the original instruction text field; The key semantic fragments are mapped to the response text template to generate error description text with semantic relevance; Determine the protocol structure of the error response packet based on the anomaly classification information; According to the protocol structure, the generated error description text and the corresponding error code are written into the error information payload area; The constructed error response package is encapsulated and sent to the client.
[0014] Optionally, generating the second target data unit based on the rewrite instruction text field includes: The rewrite instruction text field in the parsing policy determination entity is used to extract the rewrite instruction text and its associated session context information. Perform syntactic reconstruction on the rewritten instruction text to obtain a byte sequence fragment that conforms to the target paradigm; Copy the protocol header information of the first target data unit, and update the length field, sequence number field and instruction type identifier field therein; The byte sequence fragment is embedded into the updated protocol header information to obtain the second target data unit; The second target data unit is injected into the outbound transmission queue of the FSM; The target connection channel is determined by the FSM, and the second target data unit is sent to the target database through the target connection channel.
[0015] A second aspect of this application provides a data processing apparatus for relational databases, the apparatus comprising: The listening unit is used to listen to the data payload initiated by the client through the finite state machine (FSM) deployed in the proxy server, and decode the first target data unit contained therein to obtain the SQL string according to the pre-negotiation method; An AST structure building unit is used to establish a semantic binding between the SQL string and the session context information to generate a query context entity, and to parse the SQL text in the SQL text into an AST structure through an SQL syntax parser. The mapping operation unit is used to perform pattern mapping operations based on the AST structure and the preset policy rule base to obtain the matching result; A matching result generation unit is used to generate a strategy determination entity based on the matching result. The strategy determination entity includes at least an action type identifier field, a rule identification field, an original instruction text field, and a rewritten instruction text field. The first migration unit is used to transfer the FSM from the command query state to the waiting result state when the action type identifier field is the first type. The first processing unit is configured to determine the corresponding first target data unit from the cached protocol context based on the original instruction text field, and to pass through the byte sequence of the first target data unit to the target database. The second migration unit is used to migrate the FSM from the command query state back to the command state when the action type identifier field is the second type. The second processing unit is used to read the rule recognition field, generate an error response packet based on the reading result and the original instruction text field, and return it to the client. The third migration unit is used to transfer the FSM from the command query state to the waiting result state when the action type identifier field is the third type. The third processing unit is used to generate a second target data unit based on the rewrite instruction text field, and send the second target data unit to the target database.
[0016] A third aspect of this application provides a data processing system for relational databases, the system comprising: Processor, memory, input / output units, and bus; The processor is connected to the memory, the input / output unit, and the bus; The memory stores a program, which the processor invokes to execute the first aspect and any one of the optional methods in the first aspect.
[0017] A fourth aspect of this application provides a computer-readable storage medium on which a program is stored, which, when executed on a computer, performs the methods of the first aspect and any one of the first aspects.
[0018] As can be seen from the above technical solutions, this application has the following beneficial effects: 1. In this application, the finite state machine (FSM) within the proxy server no longer serves merely as a communication control unit, but rather as the core driving hub of the system state. Its state transition process spans all stages of data monitoring, semantic parsing, policy determination, and response encapsulation. A one-to-one mapping relationship is formed between the state changes of the FSM and the action type identifier field of the policy determination entity, enabling the system to maintain strict consistency between behavioral logic and state machine trajectory under different decision paths.
[0019] 2. The SQL string and session context information are semantically bound during the parsing phase, so that the generation of the AST structure not only reflects the syntactic level, but also carries semantic dependencies. This semantic context is continuously referenced in the subsequent pattern mapping and rule matching process, forming a cross-stage data semantic transmission mechanism, which enables the generation of policy decisions to have context extensibility and global consistency.
[0020] 3. The multi-domain structure of the policy decision entity (action type identifier domain, rule identification domain, original instruction text domain, and rewritten instruction text domain) forms a bridge for information reuse and behavioral coordination within the system. These domains are not assigned values in isolation, but rather form mutually driving dependency chains through policy matching and state transition processes. This creates a feedforward and feedback relationship between decision-making and execution, thereby eliminating the fragmentation inherent in traditional database agents during policy branch execution.
[0021] 4. During the execution phase, FSM automatically drives state transitions based on different values of the action type identifier field in the policy determination entity, dynamically switching between allow, deny, and rewrite paths. Protocol encapsulation, instruction generation, and error feedback under each path rely on a unified context entity for parameter parsing and content generation, giving the entire execution process a closed-loop autonomous characteristic. Therefore, the system can maintain protocol consistency and response predictability in complex concurrent sessions, significantly improving system-level stability and policy-level determinism. Attached Figure Description
[0022] To more clearly illustrate the technical solutions in this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0023] Figure 1This is a schematic flowchart of an embodiment of the data processing method for relational databases provided in this application; Figure 2 This is a schematic flowchart of an embodiment of step 104 in the data processing method for relational databases provided in this application; Figure 3 This is a schematic flowchart of an embodiment of step 110 in the data processing method for relational databases provided in this application; Figure 4 A schematic diagram of an embodiment of the data processing apparatus for relational databases provided in this application; Figure 5 This is a schematic diagram of an embodiment of the data processing system for relational databases provided in this application. Detailed Implementation
[0024] It should be understood that the methods described in the embodiments of the present invention can be executed by software and hardware entities with corresponding functions, and the executing entity is not limited to a specific device form or system deployment structure.
[0025] In practical applications, all or part of the steps of the method can be implemented by computer program instructions, and the program can be stored in a non-transitory computer-readable medium and loaded and executed by a processing unit with computing power. The processing unit can be a general-purpose server, a dedicated database proxy module, a network middleware node, a distributed transaction control component, or any computing entity capable of data protocol parsing and state transition control.
[0026] Furthermore, the method steps of the present invention can be executed by a single processing node, or can be completed collaboratively by multiple processing nodes through communication interaction. The logical division between nodes is only for implementation purposes and does not constitute a limitation on the specific implementation method.
[0027] Therefore, the execution entity, system deployment architecture, and operating environment described in the embodiments of the present invention can be flexibly adjusted according to actual business needs, network topology, or computing resource distribution, and their equivalent substitution forms should all be considered to fall within the protection scope of the present invention.
[0028] To facilitate a precise understanding of the technical solution of this invention by those skilled in the art, some terms used in the specification are explained below: Finite State Machine (FSM): A finite state machine is an execution control model with state transition logic. It achieves ordered responses and state transitions to input events by defining a set of states, transition conditions, and action logic. In this invention, the FSM is used for parsing database protocol data payloads, state-driven operations, and command flow control, but is not limited to any specific state machine implementation framework or language description method.
[0029] Data payload: refers to the data unit that conforms to the database communication protocol sent by the client to the database agent or middleware. It may contain query commands, transaction requests, or control instructions. Its encoding may differ under different protocol versions, but it can all be decoded into recognizable text information through a pre-negotiated character set.
[0030] Target data unit: Used to represent a database command message encapsulated by a protocol. In this invention, the first target data unit typically corresponds to the client's original query request, and the second target data unit corresponds to the query request after policy rewriting, but this definition is not limited by the physical encapsulation format.
[0031] Session context information refers to metadata related to a specific client connection during database interaction, including but not limited to username, database identifier, source address, permission level, and connection time. This information can be used to establish a dynamic association between query semantics and security policies.
[0032] Query context entity: This refers to a structured data object formed by semantically binding the SQL string with session context information, serving as input for subsequent syntax parsing and strategy matching. The specific implementation of this entity can be an in-memory data structure, a key-value map, or a message object instance.
[0033] SQL syntax parser: refers to a parsing component capable of parsing linear SQL text into an abstract syntax tree (AST) structure. The parser in this invention can be implemented based on syntax rules or third-party parsing tools, and the output AST structure is used to support semantic-layer-based policy matching.
[0034] Abstract Syntax Tree (AST): A syntax model that transforms the syntactic structure of SQL text statements into a hierarchical node representation. This structure reflects the tables, fields, conditions, clauses, and logical relationships in the query statement, facilitating fine-grained strategy comparison.
[0035] Policy rule base: refers to a knowledge collection used to store multi-dimensional security and access control rules. It may contain matching patterns, scope of application, action definitions, and priority information, which are used to guide the agent in making decisions on allowing, modifying, or denying input SQL requests.
[0036] Policy Decision Entity: This refers to the decision object generated based on the policy matching result, used to characterize the agent system's processing behavior in a specific session context. This entity includes at least an action type identifier field, a rule identification field, an original instruction text field, and a rewritten instruction text field, used to drive subsequent state transitions and operation execution of the FSM.
[0037] Action type identifier field: This refers to the field used in the policy decision entity to represent the decision category, in order to distinguish different processing paths such as allow (type 1), reject (type 2), and rewrite (type 3).
[0038] Rule identification field: This refers to the field that records the identification information of matching rules, which is used to generate response, audit logs, or security backtracking based on rules in subsequent processing.
[0039] Error response packet: This refers to the standardized response data unit generated by the proxy system according to the database protocol specification when the policy judgment result is rejection. Its content includes the error code, error description, and validation fields, used to form a complete request loop on the client side.
[0040] Pattern mapping operation: This refers to the process of structurally comparing the AST structure with matching patterns in the policy rule base to determine the degree of conformity between the SQL request and the policy rules. This process can be implemented using techniques such as node matching, semantic inference, or pattern indexing.
[0041] Protocol context: refers to the environmental information related to database protocol parsing cached by the agent system during the session lifecycle, which is used to support instruction reconstruction, pass-through or encapsulation during state transition.
[0042] Rewrite instruction text field: This refers to the field in the policy decision entity used to record the SQL text after policy rewriting. This text is used to generate the second target data unit and forward it to the target database for execution.
[0043] Please see Figure 1 This application first provides an embodiment of a data processing method for relational databases, which includes: S101. Listen to the data payload initiated by the client through the finite state machine (FSM) deployed in the proxy server, and decode the first target data unit contained therein to obtain the SQL string according to the pre-negotiation method; A finite state machine (FSM) deployed within the proxy server continuously monitors data payloads initiated by clients. These payloads conform to relational database communication protocols (e.g., MySQL, PostgreSQL). Upon detecting a characteristic frame header or command identifier (e.g., COM_QUERY), the FSM identifies it as the first target data unit. After identification, the FSM decodes the payload area of the data unit according to a character set pre-negotiated during session establishment (e.g., UTF-8, GBK, or other custom encoding) and parses the decoded byte stream into an SQL string.
[0044] In some implementations, FSM can use the protocol unpacking module to segment and verify data units to ensure that the SQL text content can still be completely restored in the case of multi-packet transmission or fragmentation.
[0045] The following pseudocode example further illustrates the FSM listening and data payload decoding process: / / FSM Master Listener Process procedure FSM_MainLoop(): while True: packet = ListenClientPayload() if packet is not None: state = FSM_GetCurrentState() switch state: case STATE_COMMAND_WAIT: FSM_HandleCommand(packet) case STATE_WAIT_RESULT: FSM_HandleResult(packet) default: FSM_ResetState() / / Command processing phase procedure FSM_HandleCommand(packet): if DetectProtocolHeader(packet) == True: if DetectCommandType(packet) == COM_QUERY: target_unit = packet FSM_SetState(STATE_DECODING) DecodeSQLPayload(target_unit) else: FSM_IgnorePacket(packet) else: FSM_Discard(packet) / / SQL decoding process procedure DecodeSQLPayload(target_unit): charset = GetNegotiatedCharset(session_id) payload = ExtractPayload(target_unit) decoded_text = DecodeBytes(payload, charset) if VerifyFragmentIntegrity(target_unit) == True: sql_string = ReassembleSQL(decoded_text) FSM_UpdateContext("SQL_STRING", sql_string) FSM_SetState(STATE_PARSING_READY) else: RaiseProtocolError("Fragment Missing or CRC Mismatch") In this implementation, FSM_MainLoop() represents the main loop task of the finite state machine, which continuously listens to the client channel; whenever a new payload is received, it dispatches processing logic according to the current state (command waiting / result waiting).
[0046] The FSM_HandleCommand(packet) is used to identify the protocol frame header and command identifier; for example, 0x03 in MySQL represents COM_QUERY. A successful match triggers the decoding phase.
[0047] The `DecodeSQLPayload(target_unit)` function implements the decoding logic, performing byte stream decoding based on the character set negotiated in the session; and ensures fragment integrity through `VerifyFragmentIntegrity()`.
[0048] If successful, the SQL string is written to the FSM context, awaiting subsequent parsing.
[0049] S102. Establish a semantic binding between the SQL string and the session context information to generate a query context entity, and parse the SQL text in it into an AST structure through an SQL syntax parser; The proxy server establishes a semantic binding between the obtained SQL string and the session context information. The session context includes, but is not limited to, user ID, source IP address, current database name, connection timestamp, and transaction identifier. This semantic binding process generates a query context entity, which serves as the input data structure for syntax parsing, preserving the semantic environment of the request source. The system calls an SQL syntax parser (e.g., a parsing engine based on JSQLParser or custom syntax rules) to parse the SQL text into an Abstract Syntax Tree (AST) structure. The AST structure nodes information such as keywords, table names, field names, filter conditions, and clauses in the SQL, for example: SELECT FROM sensitive_table WHERE id = 1 It can be parsed into a hierarchical tree structure containing the root node SELECT, the child node FromItem(sensitive_table), and the condition node WHERE(id=1).
[0050] The following pseudocode implementation further illustrates the query context generation and semantic binding process: procedure BuildQueryContext(sql_string, session_info): context = CreateEmptyContext() context.session_id = session_info.id context.charset = session_info.charset context.timestamp = GetSystemTimestamp() / / Establish semantic binding: associate SQL with context metadata context.semantic_link = { "user": session_info.user, "db_schema" : session_info.current_schema, "txn_state" : session_info.transaction_flag, "sql_text": sql_string }
[0051] In one implementation, this step can be achieved as follows: Obtain the current session identifier from the FSM and read the corresponding session context information; perform semantic mapping between the session context information and the SQL string to form a semantic binding pair containing the correspondence between the query statement and environment parameters; generate a query context entity based on the semantic binding pair; input the SQL string into the SQL syntax parser and identify the structural information in the SQL string through the SQL syntax parser; construct the structural information into an AST node set according to predefined syntax rules, where each node corresponds to an independent syntactic unit in the SQL statement; organize the AST node set into an AST structure.
[0052] In this implementation, this step can be performed by the semantic binding and syntax parsing module within the proxy server. Specifically, the system first obtains the unique identifier of the currently active session from the Finite State Machine (FSM) and reads the corresponding session context information accordingly. The session context information may include client connection parameters, user authentication information, target database identifier, current transaction state, and environment variables, reflecting the runtime context of the client request. Subsequently, the system performs semantic layer mapping between the session context information and the parsed SQL string. By annotating the correspondence between database objects (such as table names, field names, and function names) and context parameters (such as user, role, database, and time zone) involved in the SQL, semantic binding pairs are formed, thereby establishing a logical layer association between instruction semantics and the execution environment.
[0053] Based on the aforementioned semantic binding pairs, the system generates a query context entity to encapsulate the semantic expression and environmental dependencies of the SQL request within the current session. Next, the system inputs the SQL string into the SQL syntax parser. The parser performs lexical and syntactic analysis on the instruction content according to a predefined set of syntax rules, identifying key structural information in the SQL statement, such as SELECT clauses, WHERE conditions, JOIN relationships, function calls, and subquery units. After completing syntax recognition, the system abstracts the parsed structural information into a set of AST nodes according to the SQL syntax tree construction rules. Each node corresponds to an independent syntactic unit in the SQL statement (such as operators, expressions, field references, or logical conditions).
[0054] The system constructs a complete AST structure by traversing and associating the hierarchical dependencies between each AST node, reflecting the semantic hierarchy and logical execution path of the SQL statement. This AST structure not only provides semantic input for subsequent policy pattern matching, but also serves as the basic data model for instruction analysis, rewriting, and optimization stages, thereby achieving an organic connection between the structured expression of SQL statements and policy rule determination.
[0055] S103. Perform pattern mapping operation based on the AST structure and the preset strategy rule base to obtain the matching result; The agent system loads a multi-dimensional matching rule set from the policy rule base, including pattern definitions, scope constraints, action types, and priority information. It then performs layer-by-layer matching between the node information in the AST structure and the pattern items in the rule base. This matching can be done using string pattern matching (such as regular expressions) or structured comparison based on node type and hierarchical position.
[0056] For example, a rule can be defined as: "If a FromItem node exists in the AST, the table name is sensitive_table, and User is not equal to admin in the session context, then a rejection action is triggered." The output of the operation is a matching result object, which identifies the policy decision path corresponding to the current query command in a specific context.
[0057] In one implementation, this step can be achieved as follows: read the AST structure and its associated context meta-information from the query context entity; traverse each node of the AST structure and extract structural information related to SQL semantics; generate a syntax feature vector based on the hierarchical relationship between each node, the syntax feature vector being used to characterize the semantic pattern features of the current AST structure; load a rule set from a preset policy rule base, the rule set including at least string matching-based pattern rules and structure matching-based AST pattern rules; compare the syntax feature vector sequentially with the matching conditions of each rule in the rule set, if the syntax feature vector is found to match the matching conditions of the rule, then the rule is determined to be matched; generate a matching result based on the matched rule and its associated information.
[0058] In this implementation, this step can be achieved through structured pattern analysis and rule comparison of the AST structure using a policy matching module. Specifically, the system first reads the constructed Abstract Syntax Tree (AST) structure and its associated contextual information from the query context entity. This contextual information includes the database type of the SQL statement, current user permissions, transaction isolation level, session state, and execution environment identifier, which assist in the contextual constraint determination of policy rules. Subsequently, the system traverses each node in the AST structure and extracts structured information related to SQL semantics, such as operation type (query, insert, update, delete), statement level, table join relationships, conditional expression tree, and function call nodes, to form a structured expression of SQL semantics.
[0059] The system constructs corresponding syntactic feature vectors based on the hierarchical relationships and dependencies between each AST node. These syntactic feature vectors can be used to quantitatively characterize the pattern features of the current AST structure in the semantic space, enabling subsequent rule matching processes to achieve pattern alignment based on structural semantic features, rather than relying on specific SQL text forms. Next, the system loads a rule set from a pre-defined policy rule base. This rule set includes at least two categories: one is pattern rules based on string matching, used to identify specific keywords, function names, or SQL template patterns; the other is semantic rules based on AST structure matching, used to detect complex syntactic structures or specific combinations of operations (such as nested subqueries, cross-database joins, and queries involving sensitive fields).
[0060] The system compares the syntactic feature vector sequentially with the matching conditions of each rule in the rule set. The matching process can employ methods such as node path alignment, subtree similarity calculation, or pattern hashing to ensure that semantically equivalent but syntactically different SQL structures can still be recognized. If a rule's matching condition is found to be completely identical to the current syntactic feature vector or reaches a set similarity threshold, then the rule is considered to have been matched.
[0061] The system generates matching results based on the matched rule entries and their associated information (including rule number, strategy category, action type, and applicable context constraints). These matching results not only indicate the strategy type triggered by the current SQL statement but also serve as input for subsequent strategy determination entity generation and action type identification, thereby achieving organic linkage between the AST structure, rule base, and strategy execution logic.
[0062] S104. Generate a strategy determination entity based on the matching result. The strategy determination entity includes at least an action type identifier field, a rule identification field, an original instruction text field, and a rewritten instruction text field. The system constructs a strategy to determine entities based on the matching results. Each entity contains multiple field names, including: Action type identifier field: Indicates the corresponding policy execution path (allow, deny, or rewrite); Rule identification field: Stores the triggering rule number or hash identifier; Raw instruction text field: Stores the unmodified SQL string; Rewrite instruction text field: When the policy includes a rewrite operation, store the rewritten SQL text.
[0063] This entity serves as the direct driving force for FSM state transitions and subsequent execution actions.
[0064] See Figure 2 In an optional embodiment, step S104 can be implemented as follows: S1041, Receive the policy matching output from the pattern mapping operation; After receiving the output from the pattern mapping engine, the policy determination module within the proxy server extracts the matching result set corresponding to this SQL parsing. This matching result set may include multiple hit entries, each containing information such as rule number, matching condition, matching weight, and action indicator.
[0065] For example, when a DROP TABLE or UPDATE statement without a WHERE condition appears in the AST structure, the system may hit multiple risk rules. During this process, the policy determination module will perform preliminary filtering on the matching results, such as removing rules with low priority, insufficient confidence, or that do not match the current context.
[0066] S1042. Traverse all matching rules in the output of the strategy; The system enters a traversal loop, parsing each matching rule item by item. During traversal, the processing order can be dynamically adjusted based on rule priority level or policy category (such as security policy, performance optimization policy, access control policy) to ensure that higher priority rules take effect first. In some implementations, a rule index table can be generated during traversal to track the source of the final decision in the case of multiple matches.
[0067] S1043. For each hit rule, read and record the rule identifier to form a rule identification domain; The system extracts a unique rule identifier (such as a rule ID or policy fingerprint) from the matched rule entries. Record it in the current policy determination buffer to form the rule identification field.
[0068] For example: If the hit rule is a security denial type (such as prohibiting deletion of system tables), its rule identification field can be SEC_DENY_001; If the hit rule is a query rewrite class (such as rewriting SELECT...), (Rewritten as a specific field list), its rule identification field can be REWRITE_OPT_002.
[0069] The domain identified by this rule can serve as a basis for tracing in subsequent decision-making and error response generation processes.
[0070] S1044. Extract the action indication information related to the hit rule and map it to an action type, wherein the action type includes a first type, a second type and a third type; In this embodiment, each policy rule contains a corresponding "action indicator", such as "ALLOW", "DENY", "REWRITE", etc. The system maps the action to the value of the field corresponding to the internally standardized action type identifier based on the content of this field.
[0071] For example: "ALLOW" → Type 1; "DENY" → Second type; "REWRITE" → Third type.
[0072] The mapping result is stored in the policy determination entity as an action type identifier field. It will then directly drive the finite state machine (FSM) to perform state transitions.
[0073] S1045. Extract the original SQL string from the query context entity and assign it to the original instruction text field; In this step, the proxy server accesses the query context entity (i.e., the context structure generated in S102 above) and extracts the original SQL text content from it.
[0074] For example, if the client request is DELETE FROM users, this SQL text will be fully assigned to the original instruction text field of the policy decision entity. This operation ensures that the policy decision is accurately linked to the actual content of the original request, providing the original basis for generating error responses or rewriting instructions.
[0075] S1046. Determine whether the instruction needs to be rewritten based on the policy matching rules and context information; The system determines whether to trigger SQL rewriting logic based on the rule type and matching conditions.
[0076] The judgment conditions may include, but are not limited to: The action type that hits the rule is identified as type three; The current SQL statement is an unsafe operation, but it can be executed after semantic correction. There are alternative execution options in the current context (such as view replacement or field whitelist correction).
[0077] For example, when the rule REWRITE_OPT_002 is hit, the system will analyze the SQL based on the context structure to see if it involves sensitive fields. If there are risky fields, a rewrite branch will be triggered to reconstruct the SQL into a safe equivalent query.
[0078] One specific implementation of this step includes: receiving a rule identification field and an original instruction text field from a policy determination entity; performing a semantic scan on the original instruction text field to identify target instruction nodes involving sensitive tables, controlled columns, or restricted operations; comparing the target instruction nodes with policy matching rules and determining whether a rewrite condition is triggered, obtaining a first judgment result; obtaining metadata information from the query context entity; verifying the metadata information with the context constraints in the policy matching rules to determine whether the rewrite trigger condition is met, obtaining a second judgment result; determining whether a rewrite instruction needs to be generated based on the first judgment result and the second judgment result; if a rewrite instruction needs to be generated, calling the rewrite rule set to convert the original instruction text field to form a rewritten SQL string, and assigning it to the rewrite instruction text field of the policy determination entity.
[0079] In this embodiment, the process of "determining whether to rewrite the instruction based on the policy matching rules and context information" can be executed by the policy rewriting determination module within the proxy server. This module first receives the rule identification field and the original instruction text field from the policy determination entity to obtain the policy rules matched by the current request and the SQL text to be analyzed. A semantic scan operation is performed on the SQL instructions in the original instruction text field to identify target instruction nodes involving sensitive tables, controlled columns, or restricted operation types. During the semantic scan, the system can perform hierarchical analysis based on the node hierarchical information in the AST structure to accurately locate potential high-risk operations, such as deleting system tables, performing unconstrained updates to sensitive fields, or accessing wildcards in query statements.
[0080] After identifying the target instruction nodes, the system compares these nodes with the pattern descriptions defined in the policy matching rules. Through structured matching calculations, it determines whether a rewriting condition has been triggered, thus obtaining the first judgment result. The system extracts metadata information from the query context entity, including the username initiating the request, the client IP address, the target database name, and the current connection context. This metadata is then compared and verified against the context constraints set in the policy rules to determine whether the current operation falls within the allowed rewriting execution scope, thus obtaining the second judgment result.
[0081] The system performs a comprehensive decision-making operation on the first and second judgment results. If both meet the preset rewriting conditions, the current SQL instruction is considered to require rewriting. At this time, the proxy server calls the rewriting rule set to perform semantic reconstruction and instruction transformation on the original SQL text. Specifically, the rewriting can be implemented by replacing the target table with a proxy view, adding filtering conditions to limit the range of the result set, adjusting operation fields to avoid access to sensitive data, or executing other security-enhancing instruction replacements. After the rewriting is completed, the system writes the generated SQL string into the rewriting instruction text field of the policy judgment entity and updates the corresponding status flag to indicate that the rewriting process has been successfully completed.
[0082] S1047. If necessary, generate the rewritten SQL string and assign it to the rewrite instruction text field; When the judgment result is "rewrite is required", the system calls the semantic rewrite module. Perform semantic reconstruction or syntactic replacement on the AST structure.
[0083] For example: SELECT FROM user_info is rewritten as SELECT id, name, email FROM user_info; Rewrite UPDATE orders as conditional UPDATE orders WHERE status='pending'.
[0084] The rewritten SQL text is serialized into a string and written into the rewrite instruction text field in the policy determination entity for subsequent use by the second target data unit generation module.
[0085] S1048. Construct the rule identification field, action type identifier field, original instruction text field and rewritten instruction text field into a strategy determination entity.
[0086] The system encapsulates the aforementioned fields into a unified data structure object, namely the policy determination entity.
[0087] This entity not only contains field values, but can also include extended metadata, such as: Determine the timestamp; Scope of application of the rules; Decision source module identifier; Verify signatures, etc.
[0088] Once the policy decision entity is formed, it will be registered in the FSM control environment for subsequent state transitions and action branch decisions.
[0089] S1049. If not required, end the process.
[0090] If it is determined that no rewriting is required and the action type is identified as the first or second type, the system will directly end the process, submit the policy determination entity to the FSM, and enter the corresponding state transition process (i.e., allow or deny).
[0091] S105. When the action type identifier field is the first type, the FSM is transferred from the command query state to the waiting result state. S106. Based on the original instruction text field, determine the corresponding first target data unit from the cached protocol context, and pass the byte sequence of the first target data unit to the target database. When the action type identifier field is the first type, it means that the current query command conforms to the release policy.
[0092] The FSM control module transitions the status from command query to waiting for result and calls the protocol context cache module to retrieve the first target data unit corresponding to the SQL from the cache. The agent system verifies the data unit, and after confirming that the message sequence number and packet length fields are correct, The byte sequence is passed directly to the target database intact. In this path, the proxy system only performs logical-level state maintenance and does not rewrite the data content, ensuring minimal response latency.
[0093] S107. When the action type identifier field is the second type, the FSM is switched back from the command query state to the command state. S108. Read the rule recognition field, generate an error response packet based on the reading result and the original instruction text field, and return it to the client; When the action type identifier field is type 2, it indicates that the current request triggers the security policy and needs to be rejected. The FSM transitions the state from the command query state back to the command state so that the agent can enter a new request listening cycle. The system first reads the rule identification field in the policy decision entity to determine the corresponding rejection reason category (e.g., insufficient permissions, restricted table access, dangerous operation, etc.). An error response packet is generated based on the rule identification field and the original command text field. One implementation includes: Determine the protocol structure type of the error response packet (e.g., MySQL ERR_Packet); Write the standard error code (e.g., 1142) and error description (e.g., “SELECT command denied”) according to the error type. Call the protocol encapsulation module to assemble the above fields into a binary sequence that conforms to the protocol specification; The FSM control module sends the response packet to the client via the client session channel; After sending is complete, update the FSM's status record to maintain state consistency.
[0094] Upon receiving this response packet, the client will treat it as a standard exception response from the database. This logically completes a "request-rejection" closed loop.
[0095] In an optional embodiment, the step includes: parsing a unique identifier for the rule identification domain from the policy determination entity and retrieving the corresponding rule metadata from the policy rule base; extracting anomaly classification information, error code templates, and response text templates from the rule metadata; extracting key semantic fragments based on the original instruction text domain; performing variable mapping between the key semantic fragments and the response text templates to generate error description text with semantic relevance; determining the protocol structure of the error response packet based on the anomaly classification information; writing the generated error description text and corresponding error codes into the error information payload area according to the protocol structure; and encapsulating the constructed error response packet and sending it to the client.
[0096] Specifically, the system first parses the unique identifier of the rule identification domain from the policy determination entity, and then retrieves the rule metadata corresponding to that identifier from the policy rule base. This rule metadata typically includes anomaly classification information, error code templates, response text templates, and protocol constraint parameters, which guide the structure and content of subsequent error packet generation.
[0097] The system extracts exception classification information, error code templates, and response text templates from the rule metadata. The exception classification information identifies the protocol layer type of the error response packet, such as syntax exception, access restriction, data privilege escalation, or system exception. The error code template is used to generate corresponding standardized error codes. The response text template is a predefined interpolable text structure used to construct error information content that conforms to semantic relationships.
[0098] Furthermore, key semantic fragments are extracted from the original instruction text field. These fragments may include the table name, field name, operation type, or specific conditional expression that was denied access, reflecting the specific contextual semantics that caused the error. After extraction, the system performs semantic mapping and replacement between the key semantic fragments and parameter variables in the response template, thereby generating error description text with contextual semantic relevance. This ensures that the returned information accurately indicates the source of the problem while avoiding the leakage of sensitive system structure information.
[0099] The system determines the protocol structure type of the error response packet based on the anomaly classification information. For example, when using the MySQL protocol, it corresponds to the ERR_PACKET format, and when using the PostgreSQL protocol, it corresponds to the ErrorResponse format. Next, according to the determined protocol structure, the system writes the error description text and the corresponding error code into the error information payload area, and simultaneously calculates the data length field and the checksum number to ensure data integrity and protocol consistency.
[0100] By invoking the proxy server's protocol encapsulation module, the constructed error response packet is encapsulated into a binary sequence conforming to the target database communication protocol specification. The encapsulation process may include operations such as header splicing, payload alignment, checksum writing, and buffer flushing.
[0101] After encapsulation, the error response packet is sent to the corresponding session channel of the client via the control module of the Finite State Machine (FSM), and the state transition information is written back to the command state upon successful transmission. Through the synergistic effect of the above steps, a complete closed-loop processing flow is achieved, from rule identification to error generation, protocol encapsulation, and state transition.
[0102] Below is a pseudocode example of error response packet generation and encapsulation logic: / / Module: ErrorResponseBuilder / / Function: Based on the policy, determine the entity, generate and encapsulate an error response package function buildErrorResponse(strategyDecisionEntity): # Step 1: Analyze the core fields in the entity to determine the strategy rule_id = strategyDecisionEntity.getField("Rule identification field") original_sql = strategyDecisionEntity.getField("Original command text field") # Step 2: Retrieve the corresponding rule metadata rule_metadata = RuleRepository.fetchMetadata(rule_id) if rule_metadata is null: raise Exception("Rule metadata not found") exception_type = rule_metadata.get("Exception classification information") error_code_tpl = rule_metadata.get("error code template") response_template = rule_metadata.get("response text template") # Step 3: Extract key semantic fragments from the SQL sql_segments = SemanticScanner.extractKeywords(original_sql) # Example: Extracting table name, fields, operators, conditions, etc. # For example, SELECT FROM user_info → {TABLE: user_info, OP: SELECT} # Step 4: Semantic Mapping of Copy Templates error_text = TemplateEngine.fill(response_template, sql_segments) # Example: Template "Deny access to table {TABLE}" → "Deny access to table user_info" # Step 5: Construct the error response packet payload structure payload = ProtocolPayload() payload.writeErrorCode(generateErrorCode(error_code_tpl, rule_id)) payload.writeErrorText(error_text) payload.updateLength() payload.calcChecksum() # Step 6: Determine the protocol structure type protocol_type = ProtocolSelector.determineByException(exception_type) header = ProtocolHeaderFactory.create(protocol_type) header.attachPayload(payload) # Step 7: Call the protocol encapsulation module to perform binary serialization binary_packet = ProtocolEncoder.encode(header) S109. When the action type identifier field is the third type, the FSM is transferred from the command query state to the waiting result state. S110. Based on the rewrite instruction text field, generate a second target data unit and send the second target data unit to the target database.
[0103] When the action type identifier field is of type 3, it means that the current request needs to be rewritten in SQL.
[0104] The FSM state machine control module transitions the state from a command query state to a wait-for-result state. The system generates a second target data unit based on the SQL content in the rewritten instruction text field. One implementation method includes: Encode the rewritten SQL text into a byte array; Reconstruct the query command packet header according to the database protocol format; Calculate the length field and sequence number field of the new packet to ensure protocol consistency; The data transmission module is invoked to send the second target data unit to the target database.
[0105] In some implementations, the agent can record the correspondence between the original SQL and the rewritten SQL in parallel, so as to trace the behavior in subsequent auditing and log analysis.
[0106] See Figure 3 In an optional embodiment, this step can be implemented as follows: S1101. Parse the rewrite instruction text field in the policy determination entity and extract the rewrite instruction text and its associated session context information. In this step, the rewrite instruction text field is parsed from the policy determination entity, and the rewrite instruction text and its associated session context information are extracted. The rewrite instruction text field is typically generated by the policy engine after a rule is hit, and contains semantically rewritten SQL strings or other query instruction forms. Meanwhile, the session context information may include the current connection identifier (Connection ID), character set type (e.g., UTF-8 or GBK), transaction status identifier, target database identifier, and network sequence number status. During the parsing phase, the system loads this context information into an in-memory structure to ensure consistency and correctness in subsequent encapsulation processing at the protocol level. For example, when the proxy server detects that the rewritten SQL contains a new table alias, it can automatically adjust the semantic index table in the context cache structure at this stage.
[0107] S1102. Perform syntax reconstruction on the rewritten instruction text to obtain a byte sequence fragment that conforms to the target paradigm; The rewritten instruction text undergoes a syntax reconstruction operation to obtain a byte sequence fragment conforming to the target schema. Syntax reconstruction includes: re-tokenizing the lexical layer of the rewritten SQL statement; performing position mapping on parameter placeholders; performing normalization validation on keywords (e.g., forcing them to uppercase or using Unicode escape sequences); and rearranging structural fragments (such as the SELECT-FROM-WHERE hierarchy) to ensure the rewritten SQL conforms to the target database's syntax rules. After syntax reconstruction, the system serializes the result into a byte array, forming a payload fragment that can directly participate in protocol encapsulation.
[0108] S1103. Copy the protocol header information of the first target data unit, and update the length field, sequence number field and instruction type identifier field therein; The system copies the protocol header information of the first target data unit (i.e., the original query packet) to maintain the integrity of the session data structure. This protocol header typically contains a packet length field, a sequence number field, and a command identifier field (such as Command Type). After copying, the system dynamically updates the length field according to the length of the rewritten instruction, resets the sequence number field according to the current FSM state machine's send counter, and modifies the command type identifier field according to the type of SQL being rewritten (e.g., SELECT rewritten as SHOW, UPDATE rewritten as INSERT, etc.).
[0109] S1104. Embed the byte sequence fragment into the updated protocol header information to obtain the second target data unit; In this embodiment, the byte sequence fragment generated in step S1102 is embedded into the updated protocol header structure to form the second target data unit. This process involves memory concatenation and checksum calculation operations. Specifically, the system first pre-allocates contiguous storage space in the buffer to store the updated header and byte sequence fragment; then, it performs offset calculation and writing operations to fill the byte stream sequentially to the target offset address; finally, the protocol calculation module calculates the CRC checksum of the data unit.
[0110] S1105. Inject the second target data unit into the outbound transmission queue of the FSM; The generated second target data unit is injected into the FSM's outbound transmission queue. The FSM operates internally in an event-driven manner within the proxy server, and its outbound queue temporarily stores all command packets to be sent to the backend database. Once the modified data unit is injected into the queue, the FSM registers a status monitoring flag for it to track the data unit's transmission status, acknowledgments, and potential retransmission events.
[0111] S1106. Determine the target connection channel through the FSM, and send the second target data unit to the target database through the target connection channel.
[0112] The FSM determines the target connection channel (e.g., MySQL Session Socket or PostgreSQL Channel) based on the current connection context information and sends the second target data unit to the target database through this channel. After transmission, the FSM updates its internal state transition table, changing the current state from "command query state" to "waiting for result state" to enter the response receiving phase. During this process, the FSM can also update the retransmission window and throughput statistics based on channel feedback.
[0113] Below is a sample of structured pseudocode implementation: / / Parse and rewrite information and session context rewriteSQL= strategyEntity.rewrite_text sessionCtx= strategyEntity.session_context / / Syntax refactoring to generate protocol payload payload = serialize(SQLParser.parse(rewriteSQL)) / / Construct a new protocol packet header = clone(sessionCtx.lastPacket.header) header.update(length=payload.size + HEADER_SIZE, seq=FSM.nextSeq(sessionCtx.id), cmd=detectCommandType(rewriteSQL)) newPacket = assemble(header, payload) / / Inject into the outbound queue and send FSM.enqueue(sessionCtx.id, newPacket) FSM.send(sessionCtx.id, newPacket) In this example, the entity is determined by the strategyEntity.getField() parsing strategy, and the rewrite instruction text and its associated session context information are extracted.
[0114] The SQLParser.parse() function is called to perform syntax parsing on the rewritten instruction text and generate the corresponding syntax tree. Then, SyntaxRebuilder.normalize() is used to perform lexical normalization and structured reconstruction to eliminate semantic ambiguity. Finally, Serializer.toBytes() is used to serialize the reconstructed syntax object into a byte sequence that conforms to the database communication protocol format.
[0115] Call computeCRC() to perform cyclic redundancy check on the encapsulated data packet; The generated protocol packets are injected into the outbound transmission queue of the FSM using FSM.enqueueOutbound(); The target database connection channel is resolved using FSM.resolveChannel(); If the connection is available, real-time transmission is performed via FSM.send(); if the connection is temporarily unavailable, a retransmission waiting state is entered to ensure the integrity and recoverability of data transmission.
[0116] After completing the outbound data transmission, the FSM performs a state transition operation through FSM.updateState(), switching the state machine from the "command query state" to the "waiting for result state".
[0117] It should be understood that the foregoing method embodiments are only used to illustrate the implementation ideas and processes of the technical solutions of this application, and do not constitute the only limitation on the implementation methods. After reading the above content, those skilled in the art can implement the corresponding functions based on the same design concept through software, hardware, or a combination of software and hardware. To further illustrate the technical solutions of this application, the device structure capable of executing the foregoing methods will be described below in conjunction with device embodiments. It should be noted that the device embodiments and method embodiments correspond to each other, and their module division can correspond one-to-one with the method steps, or can be recombined, split, or implemented in an equivalent manner according to specific implementation needs.
[0118] The foregoing embodiments have described in detail the embodiments of the methods provided in this application. The embodiments of the apparatus and systems provided in this application are described below: See Figure 4 This application provides an embodiment of a data processing apparatus for relational databases, which includes: The listening unit 401 is used to listen to the data payload initiated by the client through the finite state machine (FSM) deployed in the proxy server, and decode the first target data unit contained therein to obtain the SQL string according to the pre-negotiation method; The AST structure construction unit 402 is used to establish a semantic binding between the SQL string and the session context information to generate a query context entity, and to parse the SQL text in it into an AST structure through an SQL syntax parser. The mapping operation unit 403 is used to perform pattern mapping operation based on the AST structure and the preset strategy rule base to obtain the matching result; The matching result generation unit 404 is used to generate a strategy determination entity based on the matching result. The strategy determination entity includes at least an action type identifier field, a rule identification field, an original instruction text field, and a rewritten instruction text field. The first migration unit 405 is used to transfer the FSM from the command query state to the waiting result state when the action type identifier field is the first type; The first processing unit 406 is used to determine the corresponding first target data unit from the cached protocol context based on the original instruction text field, and to pass through the byte sequence of the first target data unit to the target database. The second migration unit 407 is used to migrate the FSM from the command query state back to the command state when the action type identifier field is the second type. The second processing unit 408 is used to read the rule recognition field, generate an error response packet based on the reading result and the original instruction text field, and return it to the client. The third migration unit 409 is used to transfer the FSM from the command query state to the waiting result state when the action type identifier field is the third type. The third processing unit 410 is used to generate a second target data unit based on the rewrite instruction text field, and send the second target data unit to the target database.
[0119] Optionally, the matching result generation unit 404 is specifically used for: Receive the policy matching output from the pattern mapping operation; Iterate through all matching rules in the output of the strategy; For each hit rule, the rule identifier is read and recorded to form a rule identification domain; Extract the action indication information related to the hit rule and map it to action type, wherein the action type includes a first type, a second type and a third type; Extract the raw SQL string from the query context entity and assign it to the raw instruction text field; Determine whether the instruction needs to be rewritten based on the strategy matching rules and context information; If necessary, generate the rewritten SQL string and assign it to the rewrite instruction text field; The rule identification field, action type identifier field, original instruction text field, and rewritten instruction text field are constructed as a policy determination entity.
[0120] Optionally, the matching result generation unit 404 is specifically used for: Receive the rule identification field and the original instruction text field from the policy determination entity; Perform a semantic scan on the original instruction text field and identify target instruction nodes involving sensitive tables, controlled columns, or restricted operations; The target instruction node is compared with the policy matching rules, and it is determined whether the rewriting condition is triggered to obtain the first judgment result; Obtain the metadata information from the query context entity; The metadata information is verified against the context constraints in the policy matching rules to determine whether the rewrite triggering condition is met, and a second judgment result is obtained. Based on the first judgment result and the second judgment result, determine whether it is necessary to generate a rewrite instruction; If a rewrite instruction needs to be generated, the rewrite rule set is invoked to transform the original instruction text field, forming a rewritten SQL string, which is then assigned to the rewrite instruction text field of the policy determination entity.
[0121] Optionally, the AST structure building block 402 is specifically used for: Obtain the current session identifier from the FSM and read the corresponding session context information; The session context information is semantically mapped to the SQL string to form a semantic binding pair that contains the correspondence between query statements and environment parameters; Generate a query context entity based on the semantic binding; The SQL string is input into the SQL syntax parser, and the SQL syntax parser identifies the structural information in the SQL string. According to predefined syntax rules, the structural information is constructed into an AST node set, where each node corresponds to an independent syntactic unit in an SQL statement; The set of AST nodes is organized into an AST structure.
[0122] Optionally, the mapping operation unit 403 is specifically used for: Read the AST structure and its associated context meta-information from the query context entity; Traverse each node of the AST structure and extract structural information related to SQL semantics; A syntactic feature vector is generated based on the hierarchical relationship between each node, and the syntactic feature vector is used to characterize the semantic pattern features of the current AST structure. Load a set of rules from a preset policy rule base, the set of rules including at least pattern rules based on string matching and AST pattern rules based on structure matching; The syntax feature vector is compared sequentially with the matching conditions of each rule in the rule set. If the syntax feature vector matches the matching conditions of a rule, then the rule is determined to be matched. Matching results are generated based on the hit rules and their associated information.
[0123] Optionally, the second processing unit 408 is specifically used for: The unique identifier of the rule identification domain is parsed from the policy determination entity, and the corresponding rule metadata is retrieved from the policy rule base; Extract anomaly classification information, error code templates, and response text templates from the rule metadata; Key semantic fragments are extracted based on the original instruction text field; The key semantic fragments are mapped to the response text template to generate error description text with semantic relevance; Determine the protocol structure of the error response packet based on the anomaly classification information; According to the protocol structure, the generated error description text and the corresponding error code are written into the error information payload area; The constructed error response package is encapsulated and sent to the client.
[0124] Optionally, the third processing unit 410 is specifically used for: The rewrite instruction text field in the parsing policy determination entity is used to extract the rewrite instruction text and its associated session context information. Perform syntactic reconstruction on the rewritten instruction text to obtain a byte sequence fragment that conforms to the target paradigm; Copy the protocol header information of the first target data unit, and update the length field, sequence number field and instruction type identifier field therein; The byte sequence fragment is embedded into the updated protocol header information to obtain the second target data unit; The second target data unit is injected into the outbound transmission queue of the FSM; The target connection channel is determined by the FSM, and the second target data unit is sent to the target database through the target connection channel.
[0125] Please see Figure 5 This application also provides a data processing system for relational databases, including: Processor 501, memory 502, input / output unit 503, bus 504; The processor 501 is connected to the memory 502, the input / output unit 503, and the bus 504; The memory 502 stores a program, and the processor 501 calls the program to execute any of the methods described above.
[0126] This application also relates to a computer-readable storage medium on which a program is stored, which, when run on a computer, causes the computer to perform any of the methods described above.
[0127] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0128] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection between apparatuses or units through some interfaces, and may be electrical, mechanical, or other forms.
[0129] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0130] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0131] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
Claims
1. A data processing method for relational databases, characterized in that, The method includes: The finite state machine (FSM) deployed within the proxy server listens for data payloads initiated by the client and decodes the first target data unit contained therein to obtain the SQL string according to the pre-negotiation method. The SQL string is semantically bound to the session context information to generate a query context entity, and the SQL text in it is parsed into an AST structure by an SQL syntax parser. Based on the AST structure and the preset policy rule base, a pattern mapping operation is performed to obtain the matching result; A strategy determination entity is generated based on the matching result. The strategy determination entity includes at least an action type identifier field, a rule identification field, an original instruction text field, and a rewritten instruction text field. When the action type identifier field is the first type, the FSM is transferred from the command query state to the waiting result state; Based on the original instruction text field, the corresponding first target data unit is determined from the cached protocol context, and the byte sequence of the first target data unit is transparently transmitted to the target database; When the action type identifier field is the second type, the FSM will be transitioned from the command query state back to the command state; Read the rule recognition field, and based on the reading result, generate an error response packet content by combining it with the original instruction text field, and return it to the client; When the action type identifier field is the third type, the FSM will be transferred from the command query state to the waiting result state; Based on the rewrite instruction text field, a second target data unit is generated and sent to the target database.
2. The data processing method for relational databases according to claim 1, characterized in that, A policy determination entity is generated based on the matching result. The policy determination entity includes at least an action type identifier field, a rule identification field, an original instruction text field, and a rewritten instruction text field, including: Receive the policy matching output from the pattern mapping operation; Iterate through all matching rules in the output of the strategy; For each hit rule, the rule identifier is read and recorded to form a rule identification domain; Extract the action indication information related to the hit rule and map it to action type, wherein the action type includes a first type, a second type and a third type; Extract the raw SQL string from the query context entity and assign it to the raw instruction text field; Determine whether the instruction needs to be rewritten based on the strategy matching rules and context information; If necessary, generate the rewritten SQL string and assign it to the rewrite instruction text field; The rule identification field, action type identifier field, original instruction text field, and rewritten instruction text field are constructed as a policy determination entity.
3. The data processing method for relational databases according to claim 2, characterized in that, The step of determining whether the instruction needs to be rewritten based on the policy matching rules and context information includes: Receive the rule identification field and the original instruction text field from the policy determination entity; Perform a semantic scan on the original instruction text field and identify target instruction nodes involving sensitive tables, controlled columns, or restricted operations; The target instruction node is compared with the policy matching rules, and it is determined whether the rewriting condition is triggered to obtain the first judgment result; Obtain the metadata information from the query context entity; The metadata information is verified against the context constraints in the policy matching rules to determine whether the rewrite triggering condition is met, and a second judgment result is obtained. Based on the first judgment result and the second judgment result, determine whether it is necessary to generate a rewrite instruction; If a rewrite instruction needs to be generated, the rewrite rule set is invoked to transform the original instruction text field, forming a rewritten SQL string, which is then assigned to the rewrite instruction text field of the policy determination entity.
4. The data processing method for relational databases according to claim 1, characterized in that, The step of establishing a semantic binding between the SQL string and session context information to generate a query context entity, and parsing the SQL text in the SQL string into an AST structure using an SQL syntax parser, includes: Obtain the current session identifier from the FSM and read the corresponding session context information; The session context information is semantically mapped to the SQL string to form a semantic binding pair that contains the correspondence between query statements and environment parameters; Generate a query context entity based on the semantic binding; The SQL string is input into the SQL syntax parser, and the SQL syntax parser identifies the structural information in the SQL string. According to predefined syntax rules, the structural information is constructed into an AST node set, where each node corresponds to an independent syntactic unit in an SQL statement; The set of AST nodes is organized into an AST structure.
5. The data processing method for relational databases according to claim 4, characterized in that, The process of performing pattern mapping operations based on the AST structure and a preset policy rule base to obtain matching results includes: Read the AST structure and its associated context meta-information from the query context entity; Traverse each node of the AST structure and extract structural information related to SQL semantics; A syntactic feature vector is generated based on the hierarchical relationship between each node, and the syntactic feature vector is used to characterize the semantic pattern features of the current AST structure. Load a set of rules from a preset policy rule base, the set of rules including at least pattern rules based on string matching and AST pattern rules based on structure matching; The syntax feature vector is compared sequentially with the matching conditions of each rule in the rule set. If the syntax feature vector matches the matching conditions of a rule, then the rule is determined to be matched. Matching results are generated based on the hit rules and their associated information.
6. The data processing method for relational databases according to claim 1, characterized in that, The process of reading the rule identification field, generating an error response packet based on the reading result and the original instruction text field, and returning it to the client includes: The unique identifier of the rule identification domain is parsed from the policy determination entity, and the corresponding rule metadata is retrieved from the policy rule base; Extract anomaly classification information, error code templates, and response text templates from the rule metadata; Key semantic fragments are extracted based on the original instruction text field; The key semantic fragments are mapped to the response text template to generate error description text with semantic relevance; Determine the protocol structure of the error response packet based on the anomaly classification information; According to the protocol structure, the generated error description text and the corresponding error code are written into the error information payload area; The constructed error response package is encapsulated and sent to the client.
7. The data processing method for relational databases according to claim 1, characterized in that, The step of generating the second target data unit based on the rewritten instruction text field includes: The rewrite instruction text field in the parsing policy determination entity is used to extract the rewrite instruction text and its associated session context information. Perform syntactic reconstruction on the rewritten instruction text to obtain a byte sequence fragment that conforms to the target paradigm; Copy the protocol header information of the first target data unit, and update the length field, sequence number field and instruction type identifier field therein; The byte sequence fragment is embedded into the updated protocol header information to obtain the second target data unit; The second target data unit is injected into the outbound transmission queue of the FSM; The target connection channel is determined by the FSM, and the second target data unit is sent to the target database through the target connection channel.
8. A data processing device for relational databases, characterized in that, The device includes: The listening unit is used to listen to the data payload initiated by the client through the finite state machine (FSM) deployed in the proxy server, and decode the first target data unit contained therein to obtain the SQL string according to the pre-negotiation method; An AST structure building unit is used to establish a semantic binding between the SQL string and the session context information to generate a query context entity, and to parse the SQL text in the SQL text into an AST structure through an SQL syntax parser. The mapping operation unit is used to perform pattern mapping operations based on the AST structure and the preset policy rule base to obtain the matching result; A matching result generation unit is used to generate a strategy determination entity based on the matching result. The strategy determination entity includes at least an action type identifier field, a rule identification field, an original instruction text field, and a rewritten instruction text field. The first migration unit is used to transfer the FSM from the command query state to the waiting result state when the action type identifier field is the first type. The first processing unit is configured to determine the corresponding first target data unit from the cached protocol context based on the original instruction text field, and to pass through the byte sequence of the first target data unit to the target database. The second migration unit is used to migrate the FSM from the command query state back to the command state when the action type identifier field is the second type. The second processing unit is used to read the rule recognition field, generate an error response packet based on the reading result and the original instruction text field, and return it to the client. The third migration unit is used to transfer the FSM from the command query state to the waiting result state when the action type identifier field is the third type. The third processing unit is used to generate a second target data unit based on the rewrite instruction text field, and send the second target data unit to the target database.
9. A data processing system for relational databases, characterized in that, The system includes: Processor, memory, input / output units, and bus; The processor is connected to the memory, the input / output unit, and the bus; The memory stores a program, which the processor invokes to perform the method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium contains a program that, when executed on a computer, performs the method as described in any one of claims 1 to 7.