Data processing method, data processing device, and computer-readable storage medium
By integrating core metadata of tenants, databases, tables, columns, partitions, and statistics into the data lake architecture, and optimizing metadata access using a unified rules engine, the response latency and throughput issues of the metadata management system are resolved, achieving efficient request-level isolation and security control.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JUHAOKAN TECH CO LTD
- Filing Date
- 2026-01-26
- Publication Date
- 2026-06-19
Smart Images

Figure CN122240657A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of big data technology, and in particular to a data processing method, a data processing device, and a computer-readable storage medium. Background Technology
[0002] With the widespread adoption of data lake architectures, the performance bottleneck of metadata management systems has become increasingly prominent. Traditional architectures employ a three-tier architecture (TTA), relying on over 200 distributed gRPC (Google Remote Procedure Call) interfaces and more than 10 independent data models (DMs). This results in Structured Query Language (SQL) queries undergoing complex lexical analysis (LA) and syntax parsing (SP) processes, triggering multiple serial and parallel metadata retrieval operations. This design pattern involves a large number of network round trips (NRTs) and database queries (DQs) during each query execution, significantly increasing system response latency (RL) and drastically reducing throughput (TP). Meanwhile, due to the lack of a unified caching mechanism (CM) and tenant context (TC) management, the system cannot effectively achieve request-level isolation and permission control (PC). In high-concurrency scenarios, resource competition (RC) is fierce, and scalability (SC) is severely limited.
[0003] Therefore, how to reduce the number of database interactions (DIF) by optimizing metadata access strategies has become a key technical problem that urgently needs to be solved. Summary of the Invention
[0004] To address the aforementioned technical problems, this disclosure provides a data processing method, a data processing apparatus, and a computer-readable storage medium.
[0005] In a first aspect, this disclosure provides a data processing device, comprising: a communicator configured to: receive query information sent by a preset account, including tenant context and structured query language; wherein the tenant context includes at least one of tenant identifier, user identity, security level, and optimization preferences; a controller configured to: parse the query information and generate an actual syntax tree corresponding to the query information; wherein the actual syntax tree is generated based on table reference relationships, tenant context, and security tags in the query information, the security tags including at least tenant filtering conditions, and each node in the actual syntax tree containing tenant semantic information; generating at least one structured execution request lexical unit based on the actual syntax tree; obtaining metadata corresponding to each execution request lexical unit in a unified rule engine through a target interface; wherein the unified rule engine includes at least core metadata corresponding to tenant, database, table, column, partition, and statistical information; determining at least one physical execution plan based on the metadata; and controlling execution nodes to run according to the target execution plan to obtain query results corresponding to the query information; wherein the target execution plan includes any one of the physical execution plans.
[0006] Secondly, this disclosure provides a data processing method, comprising: receiving query information sent by a preset account, which includes tenant context and structured query language; wherein the tenant context includes at least one of tenant identifier, user identity, security level, and optimization preferences; parsing the query information to generate an actual syntax tree corresponding to the query information; wherein the actual syntax tree is generated based on table reference relationships, tenant context, and security tags in the query information, and the security tags include at least tenant filtering conditions, and each node in the actual syntax tree contains tenant semantic information; generating at least one structured execution request lexical unit based on the actual syntax tree; obtaining the metadata corresponding to each execution request lexical unit in a unified rule engine through a target interface; wherein the unified rule engine includes at least core metadata corresponding to tenant, database, table, column, partition, and statistical information respectively; determining at least one physical execution plan based on the metadata; and controlling the execution node to run according to the target execution plan to obtain the query result corresponding to the query information; wherein the target execution plan includes any one of the physical execution plans.
[0007] Thirdly, this disclosure provides a computer-readable storage medium, comprising: storing a computer program on the computer-readable storage medium, the computer program being executed by a controller using a data processing method as provided in any of the second aspects.
[0008] Fourthly, this disclosure provides a computer program product that, when run on a computer, causes the computer to perform any of the data processing methods provided in the second aspect.
[0009] It should be noted that the aforementioned computer instructions may be stored, in whole or in part, on the first computer-readable storage medium. The first computer-readable storage medium may be encapsulated together with the controller of the data processing device, or it may be encapsulated separately from the controller of the data processing device; this disclosure does not impose any limitations on this.
[0010] The descriptions of the second, third, and fourth aspects in this disclosure can be referenced to the detailed description of the first aspect; and the beneficial effects of the descriptions of the second, third, and fourth aspects can be referenced to the analysis of the beneficial effects of the first aspect, which will not be repeated here.
[0011] In this disclosure, the names of the aforementioned data processing devices do not limit the devices or functional modules themselves. In actual implementation, these devices or functional modules may appear under other names. As long as the functions of each device or functional module are similar to those of this disclosure, they fall within the scope of this disclosure and its equivalents.
[0012] These or other aspects of this disclosure will become more readily apparent in the following description.
[0013] The technical solution provided in this disclosure has the following advantages compared with the prior art: The data processing device and communicator provided in this disclosure are configured to: receive query information containing tenant context and structured query language sent by a preset account; and a controller configured to: parse the query information and generate an actual syntax tree corresponding to the query information; wherein the actual syntax tree is generated based on table reference relationships, tenant context, and security tags in the query information, and the security tags at least include tenant filtering conditions, and each node in the actual syntax tree contains tenant semantic information; generate at least one structured execution request lexical unit based on the actual syntax tree; obtain the metadata corresponding to each execution request lexical unit in a unified rule engine through a target interface; wherein the unified rule engine at least includes core metadata corresponding to tenant, database, table, column, partition, and statistical information respectively; determine at least one physical execution plan based on the metadata; and control the execution node to run according to the target execution plan to obtain the query results corresponding to the query information. It can be seen that the data processing device provided in this disclosure pre-integrates the scattered databases of core metadata corresponding to tenant, database, table, column, partition, and statistical information into a unified rule engine. Subsequently, by injecting tenant context into the query information, the core metadata corresponding to tenant, database, table, column, partition, and statistics can be retrieved in the unified rule engine at one time through the target interface. This reduces the number of network calls, improves the user experience, and solves the problem of how to reduce the number of database interaction frequencies (DIF) by optimizing metadata access strategies. Attached Figure Description
[0014] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.
[0015] To more clearly illustrate the technical solutions in the embodiments of this disclosure or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0016] Figure 1 One of the flowcharts illustrating the data processing method provided in the embodiments of this application; Figure 2 A second schematic flowchart illustrating the data processing method provided in this application embodiment; Figure 3 The third schematic flowchart of the data processing method provided in the embodiments of this application; Figure 4 The fourth flowchart illustrating the data processing method provided in this application embodiment; Figure 5 Fifth flowchart illustrating the data processing method provided in the embodiments of this application; Figure 6 This is a schematic diagram of the structure of the data processing device provided in the embodiments of this application; Figure 7 This is a schematic diagram of a chip system provided in an embodiment of this application. Detailed Implementation
[0017] To better understand the above-mentioned objectives, features, and advantages of this disclosure, the solutions disclosed herein will be further described below. It should be noted that, unless otherwise specified, the embodiments and features described herein can be combined with each other.
[0018] Numerous specific details are set forth in the following description in order to provide a full understanding of this disclosure, but this disclosure may also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only some, and not all, of the embodiments of this disclosure.
[0019] The data processing device provided in this application can have various implementation forms, such as a television, a smart television, a laser projection device, a monitor, an electronic bulletin board, an electronic table, etc. Figure 1 and Figure 2This is one specific implementation of the data processing device of this application.
[0020] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0021] In some examples, the tenant identifier in this disclosure is a unique identifier that distinguishes different tenants and is used to isolate the scope of data and resource access between tenants.
[0022] In some examples, the user identity in this disclosure is an individual identifier of the system user, containing authentication information (such as account and password) and user type attributes, used to determine the user's relationship with the organization and the scope of permissions. Based on the association method with the organization, user identities can be divided into: Internal Members: These are employees within the organization, authenticated through the organization's managed identity system (such as Microsoft Entra ID), and possess high default permissions; External Guests: These are external users such as consultants and suppliers, who log in through external accounts or social identifiers, and their permissions are typically limited to access to specific resources.
[0023] In some examples, the organizational information in this disclosure is a structured management unit within the tenant, typically corresponding to a department, team, or business unit of an enterprise, used for organizational structure management and permission inheritance.
[0024] In some examples, the gRPC interface in this disclosure is a high-performance remote procedure call (RPC) framework based on the HTTP / 2 protocol.
[0025] The data processing device provided in this disclosure can be a server. When the server executes the data processing method provided in this disclosure, it can be the server's processor.
[0026] In the following embodiments, the execution subject of the data processing method provided in the embodiments of this disclosure is the aforementioned server, which will be used as an example to illustrate the method of the embodiments of this application.
[0027] This application provides a data processing method, such as... Figure 1 As shown, the data processing method may include S11-S16.
[0028] S11. Receive query information sent by a preset account, which includes tenant context and structured query language; wherein the tenant context includes at least one of tenant identifier, user identity, security level, and optimization preferences.
[0029] In some examples, the server establishes communication connections with one or more clients. When a user needs to send a query containing SQL to the server, they can log in to the corresponding pre-defined account on the client. In this way, the user can send a query containing SQL to the server through the client. Thus, the SQL request received by the server is no longer isolated text, but a semantic unit carrying a complete context. Tenant ID, user identity, security level, optimization preferences, and other information are automatically injected when the request enters the system. For example, the same SQL query, `SELECT * FROM db.sales WHERE amount>1000`, if it comes from tenant `org_123`, will be automatically marked as a request requiring advanced security controls and performance optimization. The processing of tenant context input provides a rich semantic foundation for all subsequent stages.
[0030] In some examples, when a client submits an instruction containing SQL, the tenant context (tenant identifier (e.g., TENANT_ID), user identity, organization information, etc.) is automatically injected. For instance, when the client receives an instruction containing SQL input from a user, it first verifies the user's legitimacy through authentication (e.g., verifying a token or credentials), and performs permission verification based on the user's identity, tenant identifier (TENANT_ID), etc., to ensure the user has the permission to execute the target operation (e.g., a query operation). After verification, the client obtains the tenant context corresponding to the preset account based on the currently logged-in preset account. Then, the client injects the tenant context into the instruction containing SQL, generating query information containing both the tenant context and the SQL. It can be seen that the data processing method provided in this disclosure embodiment achieves native multi-tenant isolation and request-level security control through preprocessing such as authentication, permission verification, and context injection. It also supports automatic routing of requests to the corresponding tenant queue in high-concurrency scenarios, laying the foundation for subsequent complete data isolation and ensuring that each request clearly defines the tenant boundary and carries the necessary tenant context information from the entry point.
[0031] S12. Parse the query information and generate the actual syntax tree corresponding to the query information; wherein, the actual syntax tree is generated based on the table reference relationship, tenant context and security flag in the query information, the security flag includes at least the tenant filtering conditions, and each node in the actual syntax tree contains tenant semantic information.
[0032] In some examples, when the server parses the query information and generates the actual syntax tree corresponding to the query information, the lexical analyzer scans the query information and generates at least one lexical unit; the syntax analyzer generates the actual syntax tree based on the at least one lexical unit.
[0033] S13. Based on the actual syntax tree, generate at least one structured execution request lexical unit.
[0034] In some examples, at least one structured execution request lexical unit is generated based on the root node (e.g., TOK_TENANT_QUERY) and condition nodes of the actual syntax tree. This execution request lexical unit includes at least the following tokens: tenant ID-related token (e.g., TENANT_ID=org_123), security level-related token (e.g., security_level=high), optimization level-related token (e.g., optimization_level=advance), FROM clause identifier token (e.g., TOK_TENANT_FROM), automatic tenant filtering-related token (e.g., auto_TENANT_filter=true), partition pruning-related token (e.g., partition_pruning=auto), table reference identifier token (e.g., TOK_TENANT_TABREF), and unified metadata reference-related token (e.g., unified_metadata_). Ref='Table'), cache key related tokens (e.g., cache_key=TENANT:org_123), table name identifier tokens (e.g., TOK_TENANT_TABNAME), table name related tokens (e.g., db.sales), tenant ID binding related tokens (e.g., TENANT_ID binding=org_123), security context related tokens (e.g., security_context={read:true}), column reference identifier tokens (e.g., TOK_TENANT_COLREF), column name tokens (e.g., amount), data type related tokens (e.g., data_type=bigint), statistics related tokens (e.g., statistics=[ndv:1000, min:0, max:10000]), compression-related tokens (e.g., compression=zstd), integer literal tokens (e.g., TOK_TENANT_INT_LITERAL), integer value tokens (e.g., 1000), constant folding-related tokens (e.g., constant_folding=true), type inference-related tokens (e.g., type_inference=bigint), WHERE clause identifier tokens (e.g., TOK_TENANT_WHERE), and selectivity estimation-related tokens (e.g., selectivity_estimate=0).15) At least one of the following tokens: statistics-based (e.g., statistics_based=true), SELECT clause identifier (e.g., TOK_TENANT_SELECT), column pruning hint identifier (e.g., column_pruning_hint=true), projection pushdown identifier (e.g., projection_pushdown=enable), column expression identifier (e.g., TOK_TENANT_SELEXPR), selected column identifier (e.g., selected_columns=['auto-derive']), all column reference identifier (e.g., TOK_TENANT_ALLCOLREF), resolved column identifier (resolved_columns=[column statistics pre-binding]), and access control identifier (e.g., access_control=TENANT_scop).
[0035] S14. Obtain the metadata corresponding to each execution request lexical unit in the unified rule engine through the target interface; wherein, the unified rule engine contains at least the core metadata corresponding to tenant, database, table, column, partition and statistical information respectively.
[0036] In some examples, the target interface can be a gRPC interface.
[0037] In some examples, when the server retrieves the metadata corresponding to each execution request lexical unit in the unified rule engine through the target interface, it can send retrieval information to the unified rule engine through the target interface; query in the tenant model based on the tenant identifier to determine the validity of the tenant identifier, resource quotas and permissions, and optimization preferences; query in the database model based on the database name to determine the database identifier of the database; query in the table model based on the table name to determine the table information of the table; query in the column model based on the table identifier in the table definition to determine the column information of each column in the table corresponding to the table identifier; query in the partition model based on the table identifier to determine the partition information of the partition where the table corresponding to the table identifier is located; query in the statistical model based on the table identifier to determine the statistical information of the table corresponding to the table identifier; determine the metadata corresponding to each execution request lexical unit, including the database identifier of the database, the table information of the table, the column information of each column in the table corresponding to the table identifier, the partition information of the partition where the table corresponding to the table identifier is located, the statistical information of the table corresponding to the table identifier, and the validity of the tenant identifier, resource quotas and permissions, and optimization preferences, and one or more of the database identifier of the database. As can be seen, the data processing method provided in this embodiment of the present disclosure significantly reduces the number of network calls by obtaining complete metadata at once through a unified target interface, while integrating permission verification and statistical information acquisition, thereby improving overall efficiency.
[0038] In some examples, the data processing method provided in this disclosure employs a dual-storage caching mechanism, such as a hot storage cache (for high-frequency in-memory data) and a cold storage cache (for compressed full data). When the server needs to retrieve metadata, it checks the cache. The hot storage cache stores high-frequency in-memory data, which is typically accessed frequently. When metadata is needed, the server prioritizes searching in the hot storage cache. If the required metadata is found, it can be quickly returned, improving data retrieval efficiency. The cold storage cache stores compressed full data. If the required data is not found in the hot storage cache, the server further searches in the cold storage cache. Although the cold storage cache stores full data, the data volume is large, but after compression, it can provide complete data support when needed. If neither the hot nor the cold storage cache stores the required metadata, the server retrieves the metadata corresponding to each execution request lexical unit from the unified rule engine through the target interface.
[0039] In some examples, when the server retrieves the metadata corresponding to each execution request lexical unit in the unified rule engine through the target interface, it is divided into a unified syntax and semantic processing stage and a unified data acquisition stage. The unified syntax and semantic processing stage includes: first, initializing the tenant context, such as extracting the tenant identifier based on the tenant context, e.g., TENANT_ID=org_123; then, performing intelligent table reference identification based on the tenant identifier to obtain the table reference relationships corresponding to the tenant identifier. Next, performing unified parsing of the unified rule engine, such as unified analysis and interpretation of the unified rule engine, sorting out the model's structure, parameters, logic, etc., to better understand and use the model to process data; finally, pre-binding unified metadata: pre-binding relevant metadata, which includes information such as data definition, structure, and attributes. This binding operation makes subsequent data acquisition and processing more efficient and accurate.
[0040] The unified data acquisition phase includes: Construct a unified request (TENANT_ID, db, table, include_all). For example, based on the previously extracted tenant information (TENANT_ID) and the database (db) and table (table) to be operated on, construct a request with a unified format. `include_all` may indicate that the request includes all relevant data. This request is used to retrieve data from the server. Then, a single gRPC call (get_table_metadata()) is made. For example, using gRPC (a high-performance remote procedure call framework), a call is made to the function `get_table_metadata()` to retrieve the table's metadata information, such as column names and data types. The system first identifies the type and constraints; then, the server performs unified processing (tenant routing + cache checking + unified rule engine query). For example, after receiving a request, the server performs unified processing, including routing based on tenant information to direct the request to the correct processing module; checking the cache to see if there is any cached data that can be directly returned to improve response speed; performing a unified rule engine query to retrieve data that meets the conditions from data sources such as databases; and finally, returning a unified metadata object (TableMetadata). After the server-side processing is completed, the obtained metadata is returned to the client in the form of a unified object (TableMetadata) for subsequent business logic to use.
[0041] In this way, the metadata corresponding to each execution request lexical unit is obtained.
[0042] As can be seen, the data processing method provided in this disclosure completely abandons the original rule system based on more than 10 scattered models, replacing it with a unified rule engine based on 6 unified rule engines. This is not merely a reduction in the number of rules, but a fundamental change in the way rules are organized. The new unified rule engine can understand tenant context and dynamically inject security controls and optimization hints during the parsing process. For example, when parsing table references, the new unified rule engine not only needs to identify the syntax structure, but also automatically add tenant filtering conditions, pre-bind metadata references, and collect optimization statistics. This approach of bringing semantic information to the syntax analysis stage breaks the strict stage separation principle in traditional compilers, requiring the unified rule engine to have some semantic understanding capabilities. The generated AST structure has also undergone a qualitative change, with each node becoming a rich semantic carrier—the query root node carries the tenant ID and security level, the table reference node is pre-bound with a unified metadata object and cache key, the column reference node integrates data type and statistics, and even the literal node contains type inference and constant folding markers.
[0043] S15. Based on metadata, determine at least one physical execution plan.
[0044] In some examples, determining at least one physical execution plan based on metadata includes: parsing various execution request tokens under the TOK_TENANT_QUERY root node, extracting key information such as tenant identifiers (e.g., TENANT_ID=org_123), security level (security_level=high), and optimization level (optimization_level=advance), while integrating tenant context (e.g., data isolation policies and permission scope). For example, tenant filtering conditions are automatically added to queries based on auto_TENANT_filter=true to ensure data access isolation. A logical execution plan is constructed based on the parsed metadata, including elements such as table references (TOK_TENANT_TABREF), column references (TOK_TENANT_COLREF), and filtering conditions (TOK_TENANT_WHERE). Logical optimization is performed by combining statistical information (such as statistics=[ndv:1000, min:0, max:10000]). For example, irrelevant columns can be pruned by setting column_pruning_hint=true, or the execution order of conditions can be adjusted based on selectivity_estimate=0.15 to improve query efficiency. Physical operators are selected based on the logical plan and system resource status (such as CPU, memory, and storage distribution). For example: Scan operators: If the table is partitioned (partition_pruning=auto), partitions are pruned based on tenant ID or query conditions to reduce the data scan range; Join operators: Projection_pushdown=enable is used to push projection operations down to the storage layer to reduce data transfer volume; Aggregation operators: Constant_folding=true is used to pre-calculate constant expressions to reduce runtime overhead. At the same time, resource allocation is dynamically adjusted based on tenant performance requirements (such as response time SLA). For example, more computing resources can be allocated to high-priority tenants (security_level=high), or resource contention can be avoided through load balancing algorithms (such as LRM dynamic load balancing). The optimized physical operators are organized into an executable plan according to their execution order, including the data flow between operators, parallelism configuration (such as asynchronous parallel scheduling), and fault tolerance mechanisms (such as task timeout alerts). Finally, metadata verification (such as access_control=TENANT_scope permission check) ensures that the plan complies with the tenant's security policy, ultimately generating at least one directly executable physical execution plan.
[0045] In some examples, when determining at least one physical execution plan based on metadata, the metadata corresponding to each execution request lexical unit can be input into a simulation model for simulation to obtain at least one physical execution plan. The training process of the simulation model includes: Obtain the first training sample data and the first labeling result of the first training sample data; wherein, the first training sample data includes one or more historical data, the first labeling result includes one or more physical execution plans corresponding to each historical data, and each historical data includes metadata corresponding to each execution request lexical unit corresponding to a query information.
[0046] The first training sample data is input into the first neural network model for learning, and the first prediction result of the first neural network model on the first training sample data is obtained.
[0047] Based on the first prediction result and the first labeling result, the network parameters of the first neural network model are adjusted until the network parameters of the first neural network model converge to obtain the simulation model.
[0048] S16. Control the execution node to run according to the target execution plan and obtain the query results corresponding to the query information; wherein, the target execution plan includes any item in the physical execution plan.
[0049] In some examples, when the control execution node runs according to the target execution plan and obtains the query results corresponding to the query information, the expected cost of each physical execution plan can be obtained first; based on the expected cost, the target execution plan is determined as the physical execution plan corresponding to the minimum expected cost; the control execution node runs according to the physical execution plan corresponding to the minimum expected cost and obtains the query results corresponding to the query information.
[0050] In some examples, the control execution node runs according to the target execution plan, obtaining query results corresponding to the query information. This includes: SQL parsing completion, table reference identification (e.g., parsing the input SQL statement to identify the referenced tables, such as identifying the table reference db.db.sales, which is the basis for subsequent operations, clarifying the data object to be manipulated); then, initiating a unified metadata query request (e.g., sending a request to the metadata service gateway to obtain metadata information related to the referenced tables, including important content such as table structure and column information, providing data support for subsequent optimization); then, the metadata service gateway (e.g., acting as an intermediate layer for metadata queries), receiving and processing metadata query requests; then, request parsing and context construction (e.g., further parsing the metadata query request and constructing relevant context information so that subsequent operations can be based on accurate information); finally, data execution model mapping and physical execution plan generation based on request parsing and context construction (e.g., inputting the data obtained from request parsing and context construction into a cost-based optimizer). The Optimizer (CBO) is the core of the entire process. It performs cost optimization analysis based on acquired metadata and other information. This cost optimization analysis includes: the CBO assessing the cost of different query patterns, selecting the optimal query pattern from different dimensions such as table + storage + database, column + statistics, and partition + partition key. Specifically, it selects the optimal query pattern (table + storage + database), (column + statistics), and (partition + partition key). Selecting the optimal query pattern (table + storage + database) involves considering the table's storage method and database characteristics to choose the optimal query pattern within this dimension. Selecting the optimal query pattern (column + statistics) involves selecting the optimal query pattern based on column statistics, such as the number of distinct values and data distribution. Selecting the optimal query pattern (partition + partition key) involves selecting the optimal query pattern based on the table's partitioning and partition key settings within the partition dimension. Next, a unified rule engine matching query is performed (based on the selected optimal query mode, batch queries are performed using the table storage unified rule engine, column statistics unified rule engine, and partition unified rule engine respectively). For example: based on the information of the selected optimal query mode (table + storage + database), batch queries are performed, such as: table storage unified rule engine: batch queries are performed on the storage-related information of the table to provide data for subsequent SQL generation; based on the information of the selected optimal query mode (column + statistics), batch queries are performed, such as: column statistics unified rule engine: batch queries are performed on the statistical information of the columns; based on the information of the selected optimal query mode (partition + partition key), batch queries are performed, such as: partition unified rule engine: batch queries are performed on the partition-related information of the table.Next, based on the results of batch queries from the unified rule engine, database execution and result processing are performed. This includes: optimizing the database execution of the batch query results from the unified rule engine, such as sending the generated optimized SQL statements to the database for execution and data retrieval; then, a unified result assembler assembles the query results returned by the database to ensure a consistent format; next, permission model data enhancement is performed, such as enhancing the assembled result data according to the permission model to ensure the data meets permission requirements; then, extended attribute model integration is performed, such as integrating extended attribute models into the result data to enrich the data content. Following this, based on the processing results of database execution and result processing, results are returned and optimization feedback is provided, such as returning simplified model data, such as returning the simplified model data after a series of processing steps to the caller; then, the query engine generates the target execution plan, including generating the final physical execution plan based on the entire process; finally, CBO statistical information is fed back, such as providing statistical information during the execution process to the CBO for reference in subsequent optimizations to continuously improve optimization strategies.
[0051] In some examples, based on the CBO-optimized execution plan, the system generates three intelligent batch queries: the table storage unified rule engine retrieves database, table, and storage information through a single JOIN query; the column statistics unified rule engine retrieves column definitions and related statistics through left outer join optimization; and the partition unified rule engine retrieves partition and partition key information through partition query optimization. These batch queries use an intelligent SQL generator to create optimized SQL statements, ensuring that data from multiple streamlined models can be retrieved in a single database query.
[0052] In some examples, statistical results included in the physical execution plans (such as selectivity, null value ratio, data update frequency, etc.) can be input into the CBO to obtain the expected cost for each physical execution plan. Then, the physical execution plan corresponding to the lowest expected cost is selected as the target execution plan.
[0053] In some examples, when performing cost optimization analysis on physical execution plans based on CBO, the expected cost of the physical execution plan can be obtained based on historical statistics, server cache status, and system load included in the physical execution plan. Cache status execution includes any one of the following: cache hit rate (the proportion of requests that hit the cache), cache utilization (the proportion of used cache to total cache), and cache refresh / invalidation policy. System load includes at least any one of the following: Load Average (the average number of active processes per unit of time), number of processes in the run queue (the number of tasks waiting for CPU processing), and CPU utilization (including user-mode time (the proportion of time the CPU spends executing user-space application code), kernel-mode time (the proportion of time the CPU spends executing operating system kernel code), idle time (the proportion of time the CPU is idle with no tasks to execute), and I / O wait time (the proportion of time the CPU is idle and waiting for I / O operations to complete).
[0054] In some examples, the core implementation of CBO lies in establishing a data-driven intelligent decision-making layer. This transforms the old architecture's "procedural chained calls" into "declarative global optimization." It no longer relies on preset static rules but instead builds a dynamic cost model by continuously collecting statistical information such as historical query response times, real-time system load (CPU, memory, network), and multi-level cache hit rates. For example, the server continuously collects key feature data such as historical query response times, CPU utilization, memory usage, network bandwidth usage, and multi-level cache (such as L1, L2, L3 caches and application-level cache) hit rates, and uses this data as input to the model. This data reflects the system's resource consumption when processing queries. For instance, high response times usually mean more computation or I / O overhead, low cache hit rates increase backend load, and high CPU or memory utilization directly relates to hardware resource pressure. Next, this key feature data is correlated with actual costs (such as server billing, energy consumption, latency losses, etc.), and machine learning algorithms (such as linear regression, random forests, etc.) are used to train a cost model that can predict the current operation cost based on the real-time system state. This cost model continuously learns from new historical data and dynamically adjusts the weights of various features to achieve accurate cost assessment and prediction under different load scenarios, providing data-driven decision support for resource scheduling, query optimization, and capacity planning. Subsequently, for each metadata request (such as querying the db.sales table), the CBO uses this cost model to evaluate and estimate costs across multiple potential physical execution plans (e.g., full fetch, on-demand loading, and merging different batches of queries). It automatically selects the option with the lowest estimated cost; for example, it determines whether it's more efficient to package table, column, and partition information into a single batch query or split it into two. This decision-making process is adaptive; the server continuously calibrates the cost calculation parameters of the cost model based on feedback from actual execution results, thus becoming increasingly intelligent with use. It can be seen that, using the data processing method provided in this embodiment, under the scheduling of the CBO, the vast majority of calls can be intelligently merged into 1-3 highly optimized batch SQL queries (using JOIN and batch IN queries) on the server side, reducing the number of database interactions by 80%-90%. At the same time, CBO can make more accurate use of the cache, predict and prefetch data, reducing the average response time from hundreds of milliseconds to tens of milliseconds and increasing throughput by orders of magnitude.
[0055] The data processing method provided in this disclosure involves a server receiving query information from a preset account, containing tenant context and a structured query language; parsing the query information to generate an actual syntax tree corresponding to the query information; generating at least one structured execution request lexical unit based on the actual syntax tree; obtaining metadata corresponding to each execution request lexical unit from a unified rule engine through a target interface; determining at least one physical execution plan based on the metadata; and controlling the execution node to run according to the target execution plan to obtain the query results corresponding to the query information. It can be seen that the data processing device provided in this disclosure pre-integrates the scattered databases of core metadata corresponding to tenants, databases, tables, columns, partitions, and statistical information into a unified rule engine. Then, by injecting tenant context into the query information, the unified rule engine can search for the core metadata corresponding to tenants, databases, tables, columns, partitions, and statistical information all at once through the target interface, reducing the number of network calls and improving the user experience.
[0056] In some feasible examples, combining Figure 1 ,like Figure 2 As shown, the above S12 can be specifically implemented through the following S120-S123.
[0057] S120. The query information is split to obtain at least one keyword lexical unit; S121. Based on each keyword lexical unit, generate at least one combination sequence; S122. Based on the syntax rules of the structured query language, perform structural validity verification on each combined sequence and construct a theoretical syntax tree; S123. Rewrite the query logic based on the theoretical syntax tree and the tenant context to generate the actual syntax tree corresponding to the query information.
[0058] The data processing method provided in this disclosure involves a server receiving query information from a preset account, including tenant context and structured query language (SCL). The query information is then split to obtain at least one keyword lexical unit. Based on each keyword lexical unit, at least one combination sequence is generated. The combined sequences are then structurally validated based on SCL syntax rules to construct a theoretical syntax tree. The query logic is rewritten based on the theoretical syntax tree and tenant context to generate an actual syntax tree corresponding to the query information. Based on the actual syntax tree, at least one structured execution request lexical unit is generated. Metadata corresponding to each execution request lexical unit is obtained from a unified rule engine via a target interface. Based on the metadata, at least one physical execution plan is determined. The execution node is controlled to run according to the target execution plan to obtain the query results corresponding to the query information. It can be seen that the data processing device provided in this disclosure pre-integrates the scattered databases containing core metadata corresponding to tenants, databases, tables, columns, partitions, and statistical information into a unified rule engine. Then, by injecting tenant context into the query information, the core metadata corresponding to tenants, databases, tables, columns, partitions, and statistical information can be searched in the unified rule engine at once via the target interface, reducing the number of network calls and improving the user experience.
[0059] In some feasible examples, combining Figure 1 ,like Figure 3 As shown, the above S13 can be implemented by the following S130.
[0060] S130. Based on the root node and condition node of the actual syntax tree, generate at least one structured execution request lexical unit.
[0061] The data processing method provided in this disclosure involves a server receiving query information from a preset account, containing tenant context and a structured query language; parsing the query information to generate an actual syntax tree corresponding to the query information; generating at least one structured execution request lexical unit based on the root node and condition nodes of the actual syntax tree; obtaining metadata corresponding to each execution request lexical unit from a unified rule engine through a target interface; determining at least one physical execution plan based on the metadata; and controlling the execution node to run according to the target execution plan to obtain the query results corresponding to the query information. It can be seen that the data processing device provided in this disclosure pre-integrates the scattered databases of core metadata corresponding to tenants, databases, tables, columns, partitions, and statistical information into a unified rule engine. Then, by injecting tenant context into the query information, the core metadata corresponding to tenants, databases, tables, columns, partitions, and statistical information can be searched in the unified rule engine at once through the target interface, reducing the number of network calls and improving the user experience.
[0062] In some feasible examples, the unified rules engine includes: a tenant model for tenant identification verification and quota management; a database model for unified management of database-level information; a table model for managing table information; a column model for managing column definitions and data types in tables; a partition model for managing table partition information; and a statistical model for collecting and managing statistical information. The execution request lexical unit includes at least one of the following: tenant identifier, database name, table name, and table identifier; combined with... Figure 1 ,like Figure 4 As shown, the above S14 can be specifically implemented through the following S140-S147.
[0063] S140. Send the information to the unified rules engine through the target interface; wherein the information includes one or more of the following: tenant identifier, database name, table name, and tenant filtering conditions, and the tenant filtering conditions include at least: table identifier; S141. Based on the tenant identifier, query the tenant model to determine the validity of the tenant identifier, resource quotas and permissions, and optimization preferences; wherein, resource quotas and permissions include at least one of the following: a list of accessible databases and operation permissions, and optimization preferences include at least one of the following: routing policy and default storage format. S142. Based on the database name, perform a query in the database model to determine the database identifier; S143. Based on the table name, query the table model to determine the table information; wherein, the table information includes at least one of the following: table definition, storage description, serialization information, table-level attributes and parameters; S144. Based on the table identifier in the table definition, query the column model to determine the column information of each column in the table corresponding to the table identifier; wherein, the column information includes at least one of the following: column name, data type, length, and precision. S145. Based on the table identifier, perform a query in the partition model to determine the partition information of the partition where the table corresponding to the table identifier is located; wherein, the partition information indicates any one of the following: partition key, partition type, partition name, value, storage path, number of files, and file size; S146. Based on the table identifier, perform a query in the statistical model to determine the statistical information of the table corresponding to the table identifier; wherein, the statistical information includes at least one of the following: the number of rows and the actual information of each column, and the actual information includes at least one of the following: the maximum value, the minimum value, the number of distinct values, and the number of null values; S147. Determine one or more of the metadata corresponding to each execution request lexical unit, including the database identifier of the database, table information, column information of each column in the table corresponding to the table identifier, partition information of the partition where the table corresponding to the table identifier is located, statistical information of the table corresponding to the table identifier, validity of the tenant identifier, resource quota and permissions, and optimization preferences, and the database identifier of the database.
[0064] The data processing method provided in this disclosure includes: a server receiving query information containing tenant context and structured query language sent by a preset account; parsing the query information to generate an actual syntax tree corresponding to the query information; generating at least one structured execution request lexical unit based on the actual syntax tree; sending acquisition information to a unified rule engine through a target interface; querying the tenant model based on the tenant identifier to determine the validity of the tenant identifier, resource quotas and permissions, and optimization preferences; querying the database model based on the database name to determine the database identifier of the database; querying the table model based on the table name to determine the table information of the table; and querying the column model based on the table identifier in the table definition to determine the table identifier corresponding to the column identifier. The data processing device provides information on each column in the table; it queries the partition model based on the table identifier to determine the partition information of the partition where the table corresponding to the table identifier resides; it queries the statistical model based on the table identifier to determine the statistical information of the table corresponding to the table identifier; it determines the metadata corresponding to each execution request lexical unit, including the database identifier of the database, the table information of the table, the column information of each column in the table corresponding to the table identifier, the partition information of the partition where the table corresponding to the table identifier resides, the statistical information of the table corresponding to the table identifier, and the validity, resource quotas and permissions, and optimization preferences of the tenant identifier, as well as one or more of the database identifier of the database; based on the metadata, it determines at least one physical execution plan; it controls the execution node to run according to the target execution plan to obtain the query results corresponding to the query information. It can be seen that the data processing device provided in this disclosure pre-integrates the scattered databases of core metadata corresponding to tenants, databases, tables, columns, partitions, and statistical information into a unified rule engine. Then, by injecting tenant context into the query information, the core metadata corresponding to tenants, databases, tables, columns, partitions, and statistical information can be searched in the unified rule engine at once through the target interface, reducing the number of network calls and improving the user experience.
[0065] In some feasible examples, combining Figure 1 ,like Figure 5 As shown, the above S16 can be specifically implemented through the following S160-S162.
[0066] S160, Obtain the expected cost of each physical execution plan; In some examples, the results of the physical execution plan (such as response time, CPU utilization, cache hit rate, etc.) can be input into the cost model for cost estimation to obtain the expected cost of the physical execution plan. The training process of the cost model includes: Obtain training sample data and the first labeling result of the training sample data. The training sample data includes: response time, CPU utilization, memory usage, network bandwidth usage, and hit rate of multi-level caches (such as L1, L2, L3 caches and application layer cache) corresponding to historical query information. The first labeling result includes the actual cost (such as server billing, energy consumption, latency loss, etc.) corresponding to each historical query information.
[0067] The training sample data is input into a model that uses machine learning algorithms to learn, and the prediction results of the model using machine learning algorithms on the training sample data are obtained.
[0068] Based on the prediction and labeling results, the network parameters of the model using the machine learning algorithm are adjusted until the model using the machine learning algorithm converges, thus obtaining the cost model.
[0069] S161. Based on the expected cost, determine the target execution plan as the physical execution plan corresponding to the minimum expected cost; S162. Control the execution node to run according to the physical execution plan corresponding to the minimum expected cost, and obtain the query results corresponding to the query information.
[0070] The data processing method provided in this disclosure involves a server receiving query information from a preset account, including tenant context and a structured query language; parsing the query information to generate an actual syntax tree corresponding to the query information; generating at least one structured execution request lexical unit based on the actual syntax tree; obtaining metadata corresponding to each execution request lexical unit from a unified rule engine through a target interface; determining at least one physical execution plan based on the metadata; obtaining the expected cost of each physical execution plan; determining the target execution plan as the physical execution plan corresponding to the minimum expected cost based on the expected cost; and controlling the execution node to run according to the physical execution plan corresponding to the minimum expected cost to obtain the query result corresponding to the query information. It can be seen that the data processing device provided in this disclosure pre-integrates the scattered databases of core metadata corresponding to tenants, databases, tables, columns, partitions, and statistical information into a unified rule engine. Then, by injecting tenant context into the query information, the core metadata corresponding to tenants, databases, tables, columns, partitions, and statistical information can be searched in the unified rule engine at once through the target interface, reducing the number of network calls and improving the user experience.
[0071] The foregoing mainly describes the solutions provided by the embodiments of this application from a methodological perspective. To achieve the above functions, it includes corresponding hardware structures and / or software modules for executing each function. Those skilled in the art should readily recognize that, based on the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein, this application can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed in hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0072] This application embodiment can divide the data processing device into functional modules according to the above method example. For example, each function can be divided into its own functional modules, or two or more functions can be integrated into one processing unit. The integrated modules can be implemented in hardware or as software functional modules. It should be noted that the module division in this application embodiment is illustrative and only represents one logical functional division; other division methods may be used in actual implementation.
[0073] like Figure 6 As shown in the diagram, an embodiment of this application provides a schematic diagram of a data processing device. It includes a communicator 101 and a controller 102.
[0074] The communicator 101 is configured to receive query information sent by a preset account, which includes tenant context and structured query language; wherein the tenant context includes at least one of tenant identifier, user identity, security level, and optimization preference; Controller 102 is configured as follows: The query information is parsed to generate the actual syntax tree corresponding to the query information; the actual syntax tree is generated based on the table reference relationship, tenant context and security flag in the query information. The security flag includes at least the tenant filtering conditions, and each node in the actual syntax tree contains tenant semantic information. Based on the actual syntax tree, generate at least one structured execution request lexical unit; The metadata corresponding to each execution request lexical unit is obtained from the unified rule engine through the target interface; the unified rule engine contains at least the core metadata corresponding to tenant, database, table, column, partition, and statistical information respectively; Based on metadata, determine at least one physical execution plan; The control execution node runs according to the target execution plan to obtain the query results corresponding to the query information; the target execution plan includes any item in the physical execution plan.
[0075] In some implementable examples, controller 102 is further configured to: When parsing the query information and generating the actual syntax tree corresponding to the query information, The query information is split to obtain at least one keyword lexical unit; Based on each keyword lexical unit, generate at least one combined sequence; Based on the syntax rules of the structured query language, the structural validity of each combined sequence is verified, and a theoretical syntax tree is constructed. The query logic is rewritten based on the theoretical syntax tree and the tenant context to generate the actual syntax tree corresponding to the query information.
[0076] In some implementable examples, when controller 102 executes the generation of at least one structured execution request lexical unit based on the actual syntax tree, it is further configured to: Based on the root node and condition node of the actual syntax tree, at least one structured execution request lexical unit is generated.
[0077] In some feasible examples, the unified rules engine includes: a tenant model for tenant identification verification and quota management; a database model for unified management of database-level information; a table model for managing table information; a column model for managing column definitions and data types in tables; a partition model for managing table partition information; and a statistical model for collecting and managing statistical information. The execution request lexical unit includes at least one of the following: tenant identifier, database name, table name, and table identifier. When controller 102 retrieves the metadata corresponding to each execution request lexical unit in the unified rule engine through the target interface, it is further configured to: The system sends information to the unified rules engine through the target interface; the information includes one or more of the following: tenant identifier, database name, table name, and tenant filtering conditions. The tenant filtering conditions include at least the table identifier. Based on the tenant identifier, a query is performed in the tenant model to determine the validity of the tenant identifier, resource quotas and permissions, and optimization preferences; wherein, resource quotas and permissions include at least one of the following: a list of accessible databases and operation permissions, and optimization preferences include at least one of the following: routing policy and default storage format; The database identifier is determined by querying the database model based on the database name. The table information is determined by querying the table model based on the table name; the table information includes at least one of the following: table definition, storage description, serialization information, table-level attributes and parameters. Based on the table identifier in the table definition, a query is performed in the column model to determine the column information of each column in the table corresponding to the table identifier; wherein, the column information includes at least one of the following: column name, data type, length, and precision; Based on the table identifier, a query is performed in the partition model to determine the partition information of the partition where the table corresponding to the table identifier is located; wherein, the partition information indicates any one of the following: partition key, partition type, partition name, value, storage path, number of files, and file size; Based on the table identifier, a query is performed in the statistical model to determine the statistical information of the table corresponding to the table identifier; wherein, the statistical information includes at least one of the following: the number of rows and the actual information of each column, and the actual information includes at least one of the following: the maximum value, the minimum value, the number of distinct values, and the number of null values; The metadata corresponding to each execution request lexical unit includes one or more of the following: database identifier of the database, table information, column information of each column in the table corresponding to the table identifier, partition information of the partition where the table corresponding to the table identifier is located, statistical information of the table corresponding to the table identifier, validity of the tenant identifier, resource quotas and permissions, and optimization preferences.
[0078] In some feasible examples, when the controller 102 executes the control execution node according to the target execution plan and obtains the query results corresponding to the query information, it is further configured as follows: Obtain the expected cost of each physical execution plan; Based on the expected cost, the target execution plan is determined as the physical execution plan corresponding to the minimum expected cost; The control execution node runs according to the physical execution plan corresponding to the minimum expected cost, and obtains the query results corresponding to the query information.
[0079] All relevant content of each step involved in the above method embodiments can be referenced from the functional description of the corresponding functional module, and their functions will not be repeated here.
[0080] Of course, the data processing device provided in this application embodiment includes, but is not limited to, the modules described above. For example, the data processing device may also include a memory 103. The memory 103 may be used to store the program code of the data processing device, and may also be used to store data generated by the data processing device during operation, such as data in write requests.
[0081] like Figure 7As shown, this application embodiment also provides a chip system that can be applied to the data processing device in the foregoing embodiments. The chip system includes at least one processor 1501 and at least one interface circuit 1502. The processor 1501 may be the processor in the aforementioned data processing device. The processor 1501 and the interface circuit 1502 are interconnected via a line. The processor 1501 can receive and execute computer instructions from the memory of the aforementioned data processing device through the interface circuit 1502. When the computer instructions are executed by the processor 1501, the data processing device can perform the various steps executed by the data processing device in the foregoing embodiments. Of course, the chip system may also include other discrete devices, and this application embodiment does not specifically limit this.
[0082] This application also provides a computer-readable storage medium for storing computer instructions for operating the aforementioned data processing device.
[0083] The above description is merely a specific embodiment of this disclosure, enabling those skilled in the art to understand or implement it. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this disclosure. Therefore, this disclosure is not to be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A data processing device, characterized in that, include: The communicator is configured to receive query information sent by a preset account, which includes tenant context and structured query language; wherein the tenant context includes at least one of tenant identifier, user identity, security level, and optimization preference; The controller is configured as follows: The query information is parsed to generate an actual syntax tree corresponding to the query information; wherein, the actual syntax tree is generated based on the table reference relationship in the query information, the tenant context, and the security tag, the security tag includes at least the tenant filtering condition, and each node in the actual syntax tree contains tenant semantic information; Based on the actual syntax tree, at least one structured execution request lexical unit is generated; The metadata corresponding to each execution request lexical unit is obtained from the unified rule engine through the target interface; wherein, the unified rule engine includes at least the core metadata corresponding to tenant, database, table, column, partition, and statistical information respectively; Based on the aforementioned metadata, at least one physical execution plan is determined; The control execution node runs according to the target execution plan to obtain the query results corresponding to the query information; wherein, the target execution plan includes any one of the physical execution plans.
2. The data processing device according to claim 1, characterized in that, When the controller parses the query information and generates the actual syntax tree corresponding to the query information, it is further configured to: The query information is split to obtain at least one keyword lexical unit; Based on each of the keyword lexical units, at least one combined sequence is generated; Based on the syntax rules of the structured query language, the structural validity of each combined sequence is verified, and a theoretical syntax tree is constructed. Based on the theoretical syntax tree and the tenant context, the query logic is rewritten to generate the actual syntax tree corresponding to the query information.
3. The data processing device according to claim 1, characterized in that, When the controller generates at least one structured execution request lexical unit based on the actual syntax tree, it is further configured to: Based on the root node and condition node of the actual syntax tree, at least one structured execution request lexical unit is generated.
4. The data processing device according to claim 1, characterized in that, The unified rule engine includes: a tenant model for tenant identification verification and quota management; a database model for unified management of database-level information; a table model for managing table information; a column model for managing the definition and data type of columns in the table; a partition model for managing the partition information of the table; and a statistical model for collecting and managing statistical information. The execution request lexical unit includes at least one of the following: tenant identifier, database name, table name, and table identifier. When the controller retrieves the metadata corresponding to each execution request lexical unit in the unified rule engine through the target interface, it is further configured to: The system sends information to the unified rules engine through the target interface; the information includes one or more of the following: tenant identifier, database name, table name, and tenant filtering conditions, and the tenant filtering conditions include at least the table identifier. Based on the tenant identifier, a query is performed in the tenant model to determine the validity of the tenant identifier, resource quotas and permissions, and optimization preferences; wherein, the resource quotas and permissions include at least one of the following: a list of accessible databases and operation permissions, and the optimization preferences include at least one of the following: a routing policy and a default storage format; Based on the name of the database, a query is performed in the database model to determine the database identifier of the database; Based on the name of the table, a query is performed in the table model to determine the table information; wherein, the table information includes at least one of the following: table definition, storage description, serialization information, table-level attributes and parameters; Based on the table identifier in the table definition, the column model is queried to determine the column information of each column in the table corresponding to the table identifier; wherein, the column information includes at least one of the following: column name, data type, length, and precision; Based on the table identifier, a query is performed in the partition model to determine the partition information of the partition where the table corresponding to the table identifier is located; wherein, the partition information indicates any one of the following: partition key, partition type, partition name, value, storage path, number of files, and file size; Based on the table identifier, a query is performed in the statistical model to determine the statistical information of the table corresponding to the table identifier; wherein, the statistical information includes at least one of the number of rows and the actual information of each column, and the actual information includes at least one of the maximum value, minimum value, number of distinct values, and number of null values; The metadata corresponding to each execution request lexical unit includes one or more of the following: the database identifier of the database, the table information, the column information of each column in the table corresponding to the table identifier, the partition information of the partition where the table corresponding to the table identifier is located, the statistical information of the table corresponding to the table identifier, and the validity, resource quota and permissions, and optimization preferences of the tenant identifier.
5. The data processing device according to claim 1, characterized in that, When the controller executes the control execution node according to the target execution plan and obtains the query result corresponding to the query information, it is further configured as follows: Obtain the expected cost of each of the aforementioned physical execution plans; Based on the expected cost, the target execution plan is determined as the physical execution plan corresponding to the minimum expected cost; The control execution node runs according to the physical execution plan corresponding to the minimum expected cost, and obtains the query results corresponding to the query information.
6. A data processing method, characterized in that, include: Receive query information sent by a preset account, which includes tenant context and structured query language; wherein, the tenant context includes at least one of the following: tenant identifier, user identity, security level, and optimization preferences; The query information is parsed to generate an actual syntax tree corresponding to the query information; wherein, the actual syntax tree is generated based on the table reference relationship in the query information, the tenant context, and the security tag, the security tag includes at least the tenant filtering condition, and each node in the actual syntax tree contains tenant semantic information; Based on the actual syntax tree, at least one structured execution request lexical unit is generated; The metadata corresponding to each execution request lexical unit is obtained from the unified rule engine through the target interface; wherein, the unified rule engine includes at least the core metadata corresponding to tenant, database, table, column, partition, and statistical information respectively; Based on the aforementioned metadata, at least one physical execution plan is determined; The control execution node runs according to the target execution plan to obtain the query results corresponding to the query information; wherein, the target execution plan includes any one of the physical execution plans.
7. The data processing method according to claim 6, characterized in that, The step of parsing the query information to generate the actual syntax tree corresponding to the query information includes: The query information is split to obtain at least one keyword lexical unit; Based on each of the keyword lexical units, at least one combined sequence is generated; Based on the syntax rules of the structured query language, the structural validity of each combined sequence is verified, and a theoretical syntax tree is constructed. Based on the theoretical syntax tree and the tenant context, the query logic is rewritten to generate the actual syntax tree corresponding to the query information.
8. The data processing method according to claim 6, characterized in that, The step of generating at least one structured execution request lexical unit based on the actual syntax tree includes: Based on the root node and condition node of the actual syntax tree, at least one structured execution request lexical unit is generated.
9. The data processing method according to claim 6, characterized in that, The unified rule engine includes: a tenant model for tenant identification verification and quota management; a database model for unified management of database-level information; a table model for managing table information; a column model for managing the definition and data type of columns in the table; a partition model for managing the partition information of the table; and a statistical model for collecting and managing statistical information. The execution request lexical unit includes at least one of the following: tenant identifier, database name, table name, and table identifier. The step of obtaining the metadata corresponding to each execution request lexical unit in the unified rule engine through the target interface includes: The system sends information to the unified rules engine through the target interface; the information includes one or more of the following: tenant identifier, database name, table name, and tenant filtering conditions, and the tenant filtering conditions include at least the table identifier. Based on the tenant identifier, a query is performed in the tenant model to determine the validity of the tenant identifier, resource quotas and permissions, and optimization preferences; wherein, the resource quotas and permissions include at least one of the following: a list of accessible databases and operation permissions, and the optimization preferences include at least one of the following: a routing policy and a default storage format; Based on the name of the database, a query is performed in the database model to determine the database identifier of the database; Based on the name of the table, a query is performed in the table model to determine the table information; wherein, the table information includes at least one of the following: table definition, storage description, serialization information, table-level attributes and parameters; Based on the table identifier in the table definition, the column model is queried to determine the column information of each column in the table corresponding to the table identifier; wherein, the column information includes at least one of the following: column name, data type, length, and precision; Based on the table identifier, a query is performed in the partition model to determine the partition information of the partition where the table corresponding to the table identifier is located; wherein, the partition information indicates any one of the following: partition key, partition type, partition name, value, storage path, number of files, and file size; Based on the table identifier, a query is performed in the statistical model to determine the statistical information of the table corresponding to the table identifier; wherein, the statistical information includes at least one of the number of rows and the actual information of each column, and the actual information includes at least one of the maximum value, minimum value, number of distinct values, and number of null values; The metadata corresponding to each execution request lexical unit includes one or more of the following: the database identifier of the database, the table information, the column information of each column in the table corresponding to the table identifier, the partition information of the partition where the table corresponding to the table identifier is located, the statistical information of the table corresponding to the table identifier, and the validity, resource quota and permissions, and optimization preferences of the tenant identifier.
10. A computer-readable storage medium, characterized in that, A computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the data processing method as described in any one of claims 6-9.