Data processing method, apparatus and device

By constructing a secondary index table for the target data table, using a non-primary key field as the primary key, and combining the primary key fields of the target index table and the target data table for query processing, the problem of global scanning caused by the inability to index the primary key is solved, thereby improving data query efficiency and enabling faster response.

CN116628010BActive Publication Date: 2026-06-12ALIPAY (HANGZHOU) INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Filing Date
2023-05-26
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

When the primary key cannot be used for indexing, a global scan of the data table leads to low data query efficiency, especially when dealing with large amounts of data, making it difficult to meet the need for fast queries.

Method used

By constructing a secondary index table for the target data table, using a non-primary key field as the primary key, and combining the primary key fields of the target index table and the target data table for query processing, full table scans are avoided.

🎯Benefits of technology

It improves data query efficiency, enabling minute-level queries of massive amounts of data, and meets the rapid response needs of scenarios such as security tracing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116628010B_ABST
    Figure CN116628010B_ABST
Patent Text Reader

Abstract

Embodiments of the present specification provide a data processing method, device and equipment, wherein the method comprises: receiving a data query request for a target data table, and performing field extraction processing on a data query statement carried in the data query request to obtain a target field corresponding to the data query request; in a case where it is determined that the data query statement cannot be queried according to a primary key field in the target data table, obtaining a non-primary key field in the target data table contained in the target field, obtaining a target index table corresponding to the non-primary key field in the target data table in a secondary index table corresponding to the target data table, performing query processing on the target index table based on the data query statement and the primary key field in the target index table to obtain a first query result, and performing query processing on the target data table based on the data query statement, the first query result and the primary key field in the target data table to obtain a data query result corresponding to the data query request.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This document relates to the field of data processing technology, and in particular to a data processing method, apparatus and equipment. Background Technology

[0002] With the rapid development of computer technology, enterprises are providing users with an increasing variety and number of application services. Consequently, the volume of user data is growing daily, and data structures are becoming increasingly complex. Improving data query efficiency has become a growing concern for business processes. When querying data, indexing using the primary key can avoid global table scans and thus accelerate data retrieval.

[0003] However, when the primary key cannot be used for indexing (such as when the query filter does not contain the primary key and the database does not support creating indexes for an already created table), a global scan of the table is required. This can lead to low data query efficiency when the table contains a large amount of data. Therefore, a solution that can improve data query efficiency is needed. Summary of the Invention

[0004] The purpose of the embodiments in this specification is to provide a data processing method, apparatus, and device to provide a solution that can improve data query efficiency.

[0005] To achieve the above technical solution, the embodiments in this specification are implemented as follows:

[0006] In a first aspect, embodiments of this specification provide a data processing method, comprising: receiving a data query request for a target data table, and performing field extraction processing on a data query statement carried in the data query request to obtain a target field corresponding to the data query request; if, based on the data query statement, it is determined that a query cannot be performed based on a primary key field in the target data table, obtaining a non-primary key field in the target data table contained in the target field; obtaining a target index table in a secondary index table corresponding to the target data table that corresponds to the non-primary key field in the target data table, wherein the primary key field of the secondary index table is a non-primary key field in the target data table; performing query processing on the target index table based on the data query statement and the primary key field in the target index table to obtain a first query result, and performing query processing on the target data table based on the data query statement, the first query result, and the primary key field in the target data table to obtain a data query result corresponding to the data query request.

[0007] Secondly, embodiments of this specification provide a data processing apparatus, comprising: a request receiving module, configured to receive a data query request for a target data table, and perform field extraction processing on the data query statement carried in the data query request to obtain a target field corresponding to the data query request; a first acquisition module, configured to acquire a non-primary key field in the target data table included in the target field when it is determined, based on the data query statement, that a query cannot be performed based on the primary key field of the target data table; a second acquisition module, configured to acquire a target index table in a secondary index table corresponding to the target data table that corresponds to the non-primary key field in the target data table, wherein the primary key field of the secondary index table is a non-primary key field in the target data table; and a result determining module, configured to perform query processing on the target index table based on the data query statement and the primary key field in the target index table to obtain a first query result, and perform query processing on the target data table based on the data query statement, the first query result, and the primary key field in the target data table to obtain a data query result corresponding to the data query request.

[0008] Thirdly, embodiments of this specification provide a data processing apparatus, the data processing apparatus comprising: a processor; and a memory arranged to store computer-executable instructions, wherein the executable instructions, when executed, cause the processor to: receive a data query request for a target data table, and perform field extraction processing on a data query statement carried in the data query request to obtain a target field corresponding to the data query request; if, based on the data query statement, it is determined that a query cannot be performed based on a primary key field in the target data table, obtain a non-primary key field in the target data table contained in the target field; obtain a target index table in a secondary index table corresponding to the target data table that corresponds to the non-primary key field in the target data table, wherein the primary key field of the secondary index table is a non-primary key field in the target data table; perform query processing on the target index table based on the data query statement and the primary key field in the target index table to obtain a first query result, and perform query processing on the target data table based on the data query statement, the first query result, and the primary key field in the target data table to obtain a data query result corresponding to the data query request.

[0009] Fourthly, embodiments of this specification provide a storage medium for storing computer-executable instructions. When executed, these instructions implement the following process: receiving a data query request for a target data table, and performing field extraction processing on the data query statement carried in the data query request to obtain a target field corresponding to the data query request; if, based on the data query statement, it is determined that a query cannot be performed using the primary key field of the target data table, obtaining a non-primary key field from the target data table contained in the target field; obtaining a target index table in a secondary index table corresponding to the target data table that corresponds to the non-primary key field in the target data table, wherein the primary key field of the secondary index table is a non-primary key field in the target data table; performing query processing on the target index table based on the data query statement and the primary key field in the target index table to obtain a first query result; and performing query processing on the target data table based on the data query statement, the first query result, and the primary key field in the target data table to obtain a data query result corresponding to the data query request. Attached Figure Description

[0010] To more clearly illustrate the technical solutions in the embodiments or prior art of this specification, the drawings used in the description of the embodiments or prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0011] Figure 1 This is a schematic diagram of a data processing system described in this specification;

[0012] Figure 2A This is a flowchart illustrating an embodiment of a data processing method described in this specification;

[0013] Figure 2B This is a schematic diagram illustrating the processing procedure of one data processing method described in this specification.

[0014] Figure 3 This is a schematic diagram of a data query process described in this specification;

[0015] Figure 4 This is a schematic diagram illustrating the processing procedure of another data processing method described in this specification;

[0016] Figure 5 This is a schematic diagram illustrating the processing procedure of another data processing method described in this specification;

[0017] Figure 6 This is a schematic diagram of the structure of an embodiment of a data processing device according to this specification;

[0018] Figure 7 This is a schematic diagram of the structure of a data processing device described in this specification. Detailed Implementation

[0019] This specification provides a data processing method, apparatus, and device through its embodiments.

[0020] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this specification, and not all embodiments. Based on the embodiments in this specification, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this specification.

[0021] This specification provides a data processing method, apparatus, and device through its embodiments.

[0022] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this specification, and not all embodiments. Based on the embodiments in this specification, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this specification.

[0023] The technical solutions in this specification can be applied to data processing systems, such as... Figure 1 As shown, the data processing system can have terminal devices and servers. The servers can be independent servers or server clusters composed of multiple servers. The terminal devices can be devices such as personal computers or mobile terminal devices such as mobile phones and tablets.

[0024] The data processing system may include n terminal devices and m servers, where n and m are positive integers greater than or equal to 1. The terminal devices can be used to collect data samples. For example, the terminal devices can acquire corresponding data samples for different anomaly detection scenarios. For example, for the data anomaly detection scenario of a question-and-answer system, the terminal devices can collect user feedback information on the dialogue as data samples. For the data anomaly detection scenario of a preset business, the terminal devices can collect business data corresponding to the preset business (such as data required to execute the preset business) as data samples, etc.

[0025] Terminal devices can send collected data samples to any server in the data processing system. The server can preprocess the received data samples, generate a data table based on the preprocessed data samples, and store the generated data table in a preset database. Preprocessing operations can include text conversion preprocessing (e.g., converting audio data to text data) and text format conversion processing (e.g., converting English text to Chinese text).

[0026] Furthermore, the server can generate corresponding data tables based on local log data and store them in a pre-defined database. Taking a business scenario of tracing the source of a data leak as an example, when tracing the source of data, the server needs to perform filtering, retrieval, statistics, and analysis on existing clues in log data such as network traffic and operational behavior to locate the source of the leak. Because this type of data increases dramatically every day and spans a long storage period, a data warehouse can be used for data storage and management.

[0027] Taking a big data computing service platform (such as MaxCompute) that primarily serves the storage and computation of batch structured data, providing solutions for massive data warehouses and analytical modeling services for big data as an example, since log data is stored in the big data computing service platform, tracing the source of data breaches inevitably involves querying and retrieving the massive log data on the platform. The datasets to be queried are typically in the tens or hundreds of billions, and for long-term queries, the scale can reach trillions. For such large volumes of data, directly querying the log dataset through the big data computing service platform is time-consuming and cannot meet the rapid response requirements of security tracing. Therefore, a solution is needed to accelerate the data query process, thereby providing rapid query and analysis capabilities for security tracing scenarios.

[0028] Since the big data computing service platform does not support creating indexes for existing data tables, it is impossible to accelerate the query process directly by creating indexes. However, when creating a data table in the big data computing service platform, it supports specifying which columns in the data table are used as the primary key for bucketing using bucketing methods (such as the `clustered by` (or `range clustered by`) clauses), and specifying the sorting method of the fields within the bucket (e.g., the sorted by clause can be used to specify the sorting method of the fields within the bucket).

[0029] When creating a data table, a column or combination of columns specified like this can serve as the primary key. When the query filter includes the primary key and the operators associated with the primary key satisfy the index usage conditions, the big data computing service platform can create an index based on the primary key, thereby avoiding a full table scan and speeding up queries. However, this method of creating an index by specifying the primary key during table creation can only be done at the time of table creation. It cannot be dynamically added or deleted later based on data query needs, and only one primary key can be specified. Therefore, this method supports limited acceleration scenarios; it can only accelerate data queries when the query filter includes the primary key and the primary key conforms to the leftmost prefix matching rule.

[0030] To address the need for rapid query and analysis of massive datasets, indexing can be performed using the primary key fields in both the secondary index table and the target data table. This avoids a full table scan of the target data table, improving query efficiency and enabling minute-level queries on massive datasets, especially when the target data table contains a large amount of data.

[0031] Based on the above data processing system architecture, the data processing methods in the following embodiments can be implemented.

[0032] Example 1

[0033] like Figure 2A and Figure 2B As shown in the embodiments of this specification, a data processing method is provided. The execution subject of this method can be a server, which can be a standalone server or a server cluster composed of multiple servers. The method specifically includes the following steps:

[0034] In S202, a data query request for the target data table is received, and the data query statement carried in the data query request is processed to extract the fields to obtain the target fields corresponding to the data query request.

[0035] The target data table can be a data table corresponding to a preset user and / or a preset business. For example, the target data table can store user data of the preset user, such as user information, device information, application information, etc., or it can store business data required to perform the preset business, such as business data required to perform resource transfer business. Alternatively, the target data table can be a data table generated based on log data of a server. The data query statement can be any query statement that can be executed by any server, such as an SQL statement. The target field can be one or more fields contained in the target data table. For example, assuming the target data table contains field 1, field 2, and field 3, and the data query statement is used to query the target data table based on field 1, then the target field can be field 1.

[0036] In practice, with the rapid development of computer technology, enterprises are providing users with an increasing variety and number of application services. Consequently, the volume of user data is growing daily, and the data structure is becoming increasingly complex. Improving data query efficiency has become a growing concern for business processing parties. When querying data, primary keys can be used for indexing to avoid global table scans and accelerate query efficiency. However, when primary keys cannot be used for indexing (e.g., the query filter does not contain a primary key and the database does not support creating indexes for existing tables), a global table scan is necessary. This leads to low query efficiency, especially with large amounts of data in the table. Therefore, a solution to improve data query efficiency is needed. This specification provides a technical solution to address the above problems, as detailed below.

[0037] Taking the business processing scenario of tracing the source of data leakage as an example, the server can receive data query requests for a target data table. The target data table can be a data table related to the data leakage scenario. For example, if the server detects that user 1's privacy data has been leaked, the server can obtain the data table related to user 1 and determine the obtained data table as the target data table.

[0038] Alternatively, the server can also receive data query requests for a target data table sent by a pre-defined administrator. The target data table is the data table determined by the pre-defined administrator based on the data leakage scenario.

[0039] A data query request can carry a data query statement. The server can extract fields from the data query statement to obtain the target fields corresponding to the data query request. For example, the server can determine the target fields corresponding to the data query request based on the filter condition expression in the data query statement.

[0040] For example, taking an SQL query statement as an example, the data query statement obtained by the server could be:

[0041] SELECT field1

[0042] FROM data table 1

[0043] WHERE field2 = 1.

[0044] In this context, data table 1 is the target data table, and field 2 = 1 is the filter condition expression in the data query statement. The fields obtained by the server in the field extraction process of the data query statement can be the fields included in the filter condition expression, that is, the extracted target field is field 2.

[0045] Furthermore, the above example uses SQL statements as the data query statement to illustrate how to obtain the target field. In actual application scenarios, there are many other ways to obtain the target field, which may vary depending on the specific application scenario. This specification does not impose any specific limitations on this method.

[0046] In S204, if it is determined that a query cannot be performed based on the primary key field of the target data table based on the data query statement, the non-primary key fields in the target data table contained in the target field are obtained.

[0047] The primary key field can be one field or a combination of multiple fields. The value of the primary key field can uniquely identify each row in the data table, and the primary key field can be used to enforce the entity integrity of the data table.

[0048] In practice, in big data computing service platforms that process data in a bucketed manner, the fields (columns) specified by the clustered by (or range clustered by) and sorted by clauses when creating a data table can be determined as the primary key fields of that data table, but the values ​​of these primary key fields are not unique.

[0049] For example, a big data computing service platform may store Table 1. When creating Table 1, it can be bucketed using the id field. Therefore, the id field can be used as the primary key field of Table 1.

[0050] Since big data computing service platforms do not support creating indexes on existing data tables, when it is impossible to query based on the primary key field of the target data table (such as when the filter condition expression of the data query statement does not contain the primary key field), it is necessary to perform a full table scan on the target data table, which greatly improves data query efficiency.

[0051] Therefore, if the server determines that it cannot query based on the primary key field of the target data table based on the data query statement, it can obtain the non-primary key fields of the target data table contained in the target field. The non-primary key fields are the fields in the target data table other than the primary key fields.

[0052] In S206, retrieve the target index table corresponding to the non-primary key fields in the target data table from the secondary index table corresponding to the target data table.

[0053] The primary key field of the secondary index table can be a non-primary key field in the target data table, and the server can asynchronously synchronize data from the target data table to the secondary index table.

[0054] In implementation, for periodically scheduled partitioned data tables requiring query acceleration, the server can construct secondary index tables based on one or more non-primary key fields in the data table whose query frequency is higher than a preset frequency. The constructed secondary index table can determine the primary key field as the non-primary key field or a combination of non-primary key fields from the target data table, using the primary key field of the target data table as its regular column. The scheduling period and partition fields can be consistent with the target data table. The server can construct multiple different secondary index tables for a single target data table to accommodate different query filtering conditions.

[0055] The server can also periodically synchronize incremental partition data of the target data table to the corresponding partition of the target data table in the secondary index table through periodic synchronization tasks, and clear the data of partitions in the secondary index table that have exceeded their life cycle range, so as to achieve data synchronization and maintenance.

[0056] The server can determine the target index table for the secondary index based on the non-primary key fields in the target data table contained in the target field. For example, assuming the target data table contains fields 1, 2, and 3, where field 1 is the primary key field, the secondary index tables constructed by the server for this target data table can include secondary index table 1 and secondary index table 2. Secondary index table 1 can contain fields 1 and 2, where field 2 can be the primary key field of secondary index table 1. Secondary index table 2 can contain fields 1 and 3, where field 3 can be the primary key field of secondary index table 2.

[0057] If the target field contains field 3, which is a non-primary key field in the target data table, then the secondary index table 2 can be determined as the target index table. That is, the server can determine the secondary index table with a high degree of matching with the data query statement from the secondary index table corresponding to the target data table as the target index table.

[0058] In S208, based on the data query statement and the primary key field of the target index table, the target index table is queried to obtain the first query result. Then, based on the data query statement, the first query result, and the primary key field of the target data table, the target data table is queried to obtain the data query result corresponding to the data query request.

[0059] In implementation, assume the data query statement is an SQL statement:

[0060] SELECT info

[0061] FROM target

[0062] WHERE cid=1.

[0063] target is the target data table, and such as Figure 3 As shown, the target data table can contain fields id, cid, and info. The id field can be the primary key field in the target data table. The target index table corresponding to the target data table can contain fields cid and id. The cid field can be the primary key field in the target index table.

[0064] The server can index the primary key field (i.e., the cid field) of the target index table to perform query processing on the target index table and obtain the first query result. Then, it can index the primary key field (i.e., the id field) of the target data table to perform query processing on the target data table and obtain the data query result corresponding to the data query request.

[0065] In this way, the server can avoid a full table scan by using two indexes, which can improve data query efficiency when the amount of data in the table to be queried is large.

[0066] This specification provides a data processing method that receives a data query request for a target data table, extracts fields from the data query statement carried in the data query request to obtain the target field corresponding to the data query request, and, if it is determined that a query cannot be performed based on the primary key field of the target data table based on the data query statement, obtains the non-primary key field of the target data table contained in the target field, obtains the target index table corresponding to the non-primary key field of the target data table from the secondary index table corresponding to the target data table, wherein the primary key field of the secondary index table is a non-primary key field of the target data table, performs query processing on the target index table based on the data query statement and the primary key field of the target index table to obtain a first query result, and performs query processing on the target data table based on the data query statement, the first query result, and the primary key field of the target data table to obtain the data query result corresponding to the data query request. Since the primary key field of a data table is usually an auto-incrementing field, meaning that the primary key field of a data table usually does not contain business information, the server can create a secondary index table corresponding to the target data table through the non-primary key field of the target data table, and perform data queries in combination with the secondary index table. This fully utilizes the index acceleration feature of the primary key field of the data table, avoids a full table scan of the data table, and improves data query efficiency.

[0067] Example 2

[0068] This specification provides a data processing method, the execution subject of which can be a server, wherein the server can be a standalone server or a server cluster composed of multiple servers. The method specifically includes the following steps:

[0069] In S202, a data query request for the target data table is received.

[0070] In S402, retrieve the filter condition expression from the data query statement.

[0071] In implementation, the server can extract the filter condition expression from the data query statement, remove all "logical OR" operators and their left and right conditions from the filter condition expression, and then extract the field names in the filter condition expression that meet the following conditions:

[0072] 1. The field name is the left-hand or right-hand value of the preset relational operator, and the preset relational operator is "=", ">", "<", ">=", "<=", "IN", or one of the other relational operators that can effectively use the secondary index table;

[0073] 2. When the field name is the left-hand (right-hand) value of the preset relational operator, its corresponding right-hand (left-hand) value does not contain any field name.

[0074] The extracted field names are the target fields.

[0075] Taking SQL statements as an example, the expression after WHERE in an SQL statement can be used as the filtering condition expression in that data query statement. For example, suppose the data query statement is:

[0076] SELECT field1

[0077] FROM data table 1

[0078] WHERE field2 = 1 and field3 > 5.

[0079] Therefore, the filter condition expression obtained by the server is field2 = 1 and field3 > 5.

[0080] The above method for obtaining the filter condition expression is an optional and implementable method. In actual application scenarios, there can be many different methods, which may vary depending on the actual application scenario. This specification does not specifically limit the embodiments in this way.

[0081] In S404, the preset relational operator in the filter condition expression that is a field on one side and a non-field on the other side is determined as the target operator.

[0082] The preset relational operators can include relational operators that can be used effectively, such as "=", ">", "<", ">=", "<=", "IN", etc.

[0083] In implementation, for example, suppose the filtering condition expression is field2 = 1 and field3 > 5. Since the preset relational cloud computing service includes "=" and ">", the adjacent side of "=" is field2 and the other side is the number 1, and the adjacent side of "">" is field3 and the other side is the number 5. Then, "=" and ">" are the target operators.

[0084] In S406, the field corresponding to the target operator is determined as the target field corresponding to the data query request.

[0085] In implementation, for example, assuming the filtering condition expression is field2 = 1 and field3 > 5, and the target operators are "=" and ">", then the server can determine field2 corresponding to "=" and field3 corresponding to ">" as the target fields.

[0086] In S408, if the target field does not contain the primary key field in the target data table, it is determined that a query cannot be performed based on the primary key field in the target data table; or if the filter condition expression in the data query statement corresponding to the primary key field in the target data table is a preset condition expression, it is determined that a query cannot be performed based on the primary key field in the target data table.

[0087] Among them, the preset condition expressions include, but are not limited to, fuzzy search expressions and not-equal-to-search expressions.

[0088] In practice, if the extracted target field does not contain the primary key field in the target data (i.e., the data query request is targeting a non-primary key field in the target data), the server cannot perform an index query based on the primary key field in the target data table. Alternatively, if the filter condition expression in the data query statement corresponding to the primary key field in the target data table is a fuzzy search expression or not equal to the search expression, the server also cannot query based on the primary key field in the target data table.

[0089] In S204, if it is determined that a query cannot be performed based on the primary key field of the target data table based on the data query statement, the non-primary key fields in the target data table contained in the target field are obtained.

[0090] In S410, retrieve the non-primary key fields from the target data table.

[0091] In S412, based on the business processing requirements corresponding to the target data table, the non-primary key fields in the target data table are filtered to obtain the target non-primary key fields.

[0092] In implementation, the server can obtain historical data query requests for the target data table and determine the business processing requirements for the target data table based on the historical data query requests. Based on the business processing requirements, the server can perform field filtering on the non-primary key fields in the target data table to obtain the target non-primary key fields. This allows the secondary index table built based on the highly distinguishable target non-primary key fields to be used for the business processing requirements of the target data table.

[0093] For example, the server can obtain the historical query frequency of non-primary key fields in the target data table based on historical data query requests, and the server can determine non-primary key fields with query frequencies higher than a preset frequency threshold as target non-primary key fields.

[0094] The method for determining the target non-primary key field described above is an optional and implementable method. In actual application scenarios, there can be many different methods, which may vary depending on the actual application scenario. This specification does not specifically limit the methods used in this embodiment.

[0095] Since the fields in the filter condition expressions are usually highly distinguishable fields such as identity information and device information, by creating a secondary index table in advance for commonly used filter fields in the main table, and combining the secondary index table to perform join queries or subqueries, we can make full use of the primary key field index acceleration feature of the big data computing service platform, avoid full table scans, and achieve query acceleration.

[0096] In S414, the target non-primary key field is determined as the primary key field of the secondary index table, and the primary key field of the target data table is determined as the non-primary key field of the secondary index table, thus obtaining the secondary index table corresponding to the target data table.

[0097] In implementation, the target field can contain multiple non-primary key fields from the target data table. Multiple secondary index tables can be built through the target field. That is, the server can combine non-primary key fields from multiple target data tables and determine the combined non-primary key fields as the primary key field of the secondary index table.

[0098] For example, suppose the target data table contains fields 1, 2, and 3, where field 1 is the primary key field and fields 2 and 3 are non-primary key fields. If the server determines that the target non-primary key fields include fields 2 and 3, then a secondary index table 1 can be built based on fields 1 and 2, a secondary index table 2 can be built based on fields 1 and 3, and a secondary index table 3 can be built based on fields 1, 2, and 3. That is, in secondary index table 1, field 2 is the primary key field and field 1 is a non-primary key field; in secondary index table 2, field 3 is the primary key field and field 1 is a non-primary key field; in secondary index table 3, fields 2 and 3 can be composite primary key fields, and field 1 is a non-primary key field.

[0099] In S416, based on the leftmost prefix matching rule, non-primary key fields in multiple target data tables are combined to obtain multiple field combinations.

[0100] The field combination can include one or more non-primary key fields from the target data table. The leftmost prefix matching rule means that when a composite index containing multiple fields is created on a data table, when querying the data table, the matching can be performed from left to right according to the field order in the defined index until a non-equality query is encountered.

[0101] In implementation, assuming that the non-primary key fields in the target data table include field 1, field 2 and field 3, then, based on the leftmost prefix matching rule, these multiple non-primary key fields are combined to obtain field combinations that can include field combination 1: (field 1), field combination 2: (field 1, field 2) and field combination 3: (field 1, field 2, field 3).

[0102] In S418, the field combination and the primary key field of the secondary index table are matched to obtain the matching result. Based on the matching result, the target index table corresponding to the non-primary key field in the secondary index table of the target data table is determined.

[0103] In this context, the primary key field of the secondary index table can be a non-primary key field from the target data table.

[0104] In implementation, it is assumed that the field combinations include the above field combination 1, field combination 2 and field combination 3, and the secondary index tables include secondary index table 1 and secondary index table 2. The primary key field of secondary index table 1 is field 1, and the primary key fields of secondary index table 2 include field 1 and field 2. The server can obtain the matching degree between each field combination and each secondary index table, and determine the target index table based on the matching degree and the number of fields contained in the field combination.

[0105] If the matching degree between field combination 2 and the primary key field of secondary index table 2 is 100%, and the matching degree between field combination 1 and the primary key field of secondary index table 1 is 100%, and field combination 1 contains 2 fields and field combination 2 contains 1 field, then secondary index table 2 can be determined as the target index table.

[0106] In S420, based on the data query statement and the primary key field in the secondary index table, the number of data items in the target index table corresponding to the data query statement is determined.

[0107] In implementation, the part of the filtering condition expression of the data query statement that can be directly retrieved through the target index table can be extracted. That is, the server can determine the total number of rows that this filtering condition expression can hit in the target index table as the number of data items in the target index table corresponding to the data query statement.

[0108] The server can perform data query processing based on the relationship between the number of items and a preset item count threshold, using subqueries or joins. That is, when the number of items does not exceed the preset item count threshold, such as... Figure 4 As shown, after S420, S422 can continue to be executed, or, if the number of items is greater than the preset number of items threshold, as... Figure 5 As shown, after S420, S424 can continue to be executed.

[0109] The preset number of items threshold can be determined based on the business processing requirements of the target data table.

[0110] In S422, if the number of items is not greater than a preset item number threshold, a subquery is used to query the target index table based on the data query statement and the primary key field of the target index table to obtain the first query result. Then, based on the data query statement, the first query result, and the primary key field of the target data table, the target data table is queried to obtain the data query result corresponding to the data query request.

[0111] In implementation and practical application, the processing method of S422 can be varied. The following is one optional implementation method, which can be found in steps A1 to A2 below:

[0112] Step A1: Based on the data query statement, the primary key field in the target index table, and the primary key field in the target data table, update the data query statement to obtain the first data query statement.

[0113] The first data query statement may include a first sub-data query statement for querying based on the primary key field of the target index table to obtain the first query result, and a second sub-query statement for querying the target data table based on the first query result and the primary key field of the target data table to obtain the data query result corresponding to the data query request.

[0114] Step A2: Execute the first data query statement to obtain the data query result corresponding to the data query request.

[0115] In implementation, taking SQL statements as an example, the server can rewrite SQL statements to perform data queries through subqueries. For instance, the server can generate a subquery SQL statement (i.e., the first sub-data query statement) based on the original SQL statement, expressing the following semantics: 1. Extract the part of the filtering condition expression in the original SQL statement that can be directly retrieved through the primary key field of the target index table, and use it as the filtering condition expression for the subquery SQL statement. In this way, based on the subquery SQL statement, the set of all primary key field values ​​of the target data table that satisfy the filtering condition expression can be retrieved through the target index table; 2. Concatenate an IN operator with the primary key field name of the target data table as the lvalue and the subquery SQL statement as the rvalue into the original SQL filtering condition expression.

[0116] For example, suppose the original SQL statement is:

[0117] SELECT info

[0118] FROM target

[0119] WHERE cid=1.

[0120] The first data query statement obtained by updating the data query statement can be:

[0121] SELECT info

[0122] FROM target

[0123] WHERE id in(SELETC id

[0124] FROM index

[0125] (WHERE cid=1).

[0126] Where `target` is the target data table, `index` is the target index table, and the first subquery statement is:

[0127] SELETC id

[0128] FROM index

[0129] WHERE cid=1.

[0130] Because the `cid` field used as a filter in the original SQL statement is a non-primary key field of the target table, the server cannot use the index on the primary key field `id` for query acceleration and must perform a full table scan of the target table. The server can rewrite the query statement using the target index table, changing the filter condition from the non-primary key field in the original SQL statement to filter conditions on the primary key fields `cid` and `id` in the target index table. This fully utilizes the characteristics of the table index, avoids a full table scan, and accelerates the entire query process.

[0131] In S424, if the number of items exceeds a preset threshold, a query is performed on the target index table using a relational query method, based on the data query statement and the primary key field of the target index table, to obtain a first query result. Then, based on the data query statement, the first query result, and the primary key field of the target data table, the target data table is queried to obtain the data query result corresponding to the data query request.

[0132] In implementation and practical applications, the processing method of S424 described above can vary. The following is one optional implementation method, which can be found in steps B1 to B2 below:

[0133] Step B1 involves updating the data query statement based on the data query statement, the primary key field in the target index table, and the primary key field in the target data table to obtain the second data query statement.

[0134] In implementation, the server can prefix all field names in the original SQL statement with the target table's name and add a JOIN condition between the target table and the target index table. The ON condition is: the primary key field name of the target table equals the corresponding field name in the target index table. The server can extract the portion of the original filter expression that can be directly retrieved through the target index table, prefix all field names in this portion with the target index table's name, and then append it to the original filter expression.

[0135] For example, suppose the original SQL statement is:

[0136] SELECT info

[0137] FROM target

[0138] WHERE cid=1.

[0139] The second data query statement obtained by updating the first data query statement is:

[0140] SELECT info

[0141] FROM target

[0142] JOIN index

[0143] ON target.id = indx.id

[0144] WHERE target.cid=1

[0145] AND index.cid = 1.

[0146] In addition, the server can also perform alignment processing on the target data table and the target index table by adding the partition field name of the target data table = the partition field name of the target index table in the ON condition.

[0147] Step B1: Execute the second data query statement, perform query processing on the target index table based on the primary key field of the target index table to obtain the first query result, and perform query processing on the data table obtained by associating the target index table and the target data table based on the first query result and the primary key field of the target data table to obtain the data query result corresponding to the data query request.

[0148] In implementation, the server submits the rewritten new SQL statement (i.e., the second data query statement) to the big data computing service platform for execution. This allows the primary key indexes of both the main table (i.e., the target data table) and the secondary index table (i.e., the target index table) to be used simultaneously, avoiding full table scans and accelerating queries.

[0149] Because the query SQL is rewritten to obtain the updated data query statement, the implementation is more lightweight and has lower machine costs. Query speed can be significantly improved, comparable to the acceleration effect of query optimization through a distributed query acceleration engine.

[0150] This specification provides a data processing method that receives a data query request for a target data table, extracts fields from the data query statement carried in the data query request to obtain the target field corresponding to the data query request, and, if it is determined that a query cannot be performed based on the primary key field of the target data table based on the data query statement, obtains the non-primary key field of the target data table contained in the target field, obtains the target index table corresponding to the non-primary key field of the target data table from the secondary index table corresponding to the target data table, wherein the primary key field of the secondary index table is a non-primary key field of the target data table, performs query processing on the target index table based on the data query statement and the primary key field of the target index table to obtain a first query result, and performs query processing on the target data table based on the data query statement, the first query result, and the primary key field of the target data table to obtain the data query result corresponding to the data query request. Since the primary key field of a data table is usually an auto-incrementing field, meaning that the primary key field of a data table usually does not contain business information, the server can create a secondary index table corresponding to the target data table through the non-primary key field of the target data table, and perform data queries in combination with the secondary index table. This fully utilizes the index acceleration feature of the primary key field of the data table, avoids a full table scan of the data table, and improves data query efficiency.

[0151] Example 3

[0152] The above describes the data processing method provided in the embodiments of this specification. Based on the same idea, the embodiments of this specification also provide a data processing device, such as... Figure 6 As shown.

[0153] The data processing device includes: a request receiving module 601, a first acquisition module 602, a second acquisition module 603, and a result determination module 604, wherein:

[0154] The request receiving module 601 is used to receive a data query request for a target data table, and to perform field extraction processing on the data query statement carried in the data query request to obtain the target field corresponding to the data query request.

[0155] The first acquisition module 602 is used to acquire, when it is determined that a query cannot be performed based on the primary key field of the target data table, the non-primary key field of the target data table contained in the target field.

[0156] The second acquisition module 603 is used to acquire the target index table in the secondary index table corresponding to the target data table and the target index table corresponding to the non-primary key field in the target data table, wherein the primary key field of the secondary index table is the non-primary key field in the target data table;

[0157] The result determination module 604 is used to perform query processing on the target index table based on the data query statement and the primary key field of the target index table to obtain a first query result, and to perform query processing on the target data table based on the data query statement, the first query result and the primary key field of the target data table to obtain a data query result corresponding to the data query request.

[0158] In the embodiments of this specification, the result determination module 604 is used for:

[0159] Based on the data query statement and the primary key field in the secondary index table, determine the number of data items in the target index table corresponding to the data query statement;

[0160] If the number of items is not greater than a preset number of items threshold, the target index table is queried using a subquery based on the data query statement and the primary key field of the target index table to obtain the first query result. Then, the target data table is queried based on the data query statement, the first query result, and the primary key field of the target data table to obtain the data query result corresponding to the data query request.

[0161] If the number of items exceeds a preset threshold, the target index table is queried using a relational query method based on the data query statement and the primary key field of the target index table to obtain the first query result. Then, the target data table is queried based on the data query statement, the first query result, and the primary key field of the target data table to obtain the data query result corresponding to the data query request.

[0162] In the embodiments of this specification, the result determination module 604 is used for:

[0163] Based on the data query statement, the primary key field in the target index table, and the primary key field in the target data table, the data query statement is updated to obtain a first data query statement. The first data query statement includes a first sub-data query statement for querying based on the primary key field in the target index table to obtain the first query result, and a second sub-query statement for querying the target data table based on the first query result and the primary key field in the target data table to obtain the data query result corresponding to the data query request.

[0164] Execute the first data query statement to obtain the data query result corresponding to the data query request.

[0165] In the embodiments of this specification, the result determination module 604 is used for:

[0166] Based on the data query statement, the primary key field in the target index table, and the primary key field in the target data table, the data query statement is updated to obtain a second data query statement.

[0167] The second data query statement is executed, and the target index table is queried based on the primary key field of the target index table to obtain the first query result. Based on the first query result and the primary key field of the target data table, the data table obtained by associating the target index table and the target data table is queried to obtain the data query result corresponding to the data query request.

[0168] In the embodiments described in this specification, the device further includes:

[0169] The third acquisition module is used to acquire non-primary key fields in the target data table;

[0170] The field filtering module is used to perform field filtering on the non-primary key fields in the target data table based on the business processing requirements corresponding to the target data table, so as to obtain the target non-primary key fields;

[0171] A construction module is used to determine the target non-primary key field as the primary key field of the secondary index table, and to determine the primary key field of the target data table as the non-primary key field of the secondary index table, thereby obtaining the secondary index table corresponding to the target data table.

[0172] In this embodiment of the specification, the request receiving module 601 is used for:

[0173] Obtain the filter condition expression from the data query statement;

[0174] Among the preset relational operators in the filtering condition expression, the preset relational operator with a field on one side and a non-field on the other side is determined as the target operator;

[0175] The field corresponding to the target operator is determined as the target field corresponding to the data query request.

[0176] In this embodiment of the specification, the first acquisition module 602 is used for:

[0177] Based on the leftmost prefix matching rule, the non-primary key fields in the multiple target data tables are combined to obtain multiple field combinations, and the field combinations contain one or more non-primary key fields in the target data tables;

[0178] The field combination and the primary key field of the secondary index table are matched to obtain a matching result. Based on the matching result, the target index table corresponding to the non-primary key field of the target data table is determined in the secondary index table corresponding to the target data table.

[0179] In the embodiments described in this specification, the device further includes:

[0180] The first determining module is configured to determine that a query cannot be performed based on the primary key field of the target data table when the target field does not contain the primary key field of the target data table; or

[0181] The second determining module is used to determine that a query cannot be performed based on the primary key field in the target data table if the filtering condition expression corresponding to the primary key field in the data query statement is a preset condition expression. The preset condition expression includes, but is not limited to, fuzzy search expressions and inequality search expressions.

[0182] This specification provides a data processing apparatus that receives a data query request for a target data table, extracts fields from the data query statement carried in the data query request to obtain the target field corresponding to the data query request, and, if it is determined based on the data query statement that a query cannot be performed based on the primary key field of the target data table, obtains the non-primary key field of the target data table contained in the target field, obtains the target index table corresponding to the non-primary key field of the target data table from the secondary index table corresponding to the target data table, wherein the primary key field of the secondary index table is a non-primary key field of the target data table, performs query processing on the target index table based on the data query statement and the primary key field of the target index table to obtain a first query result, and performs query processing on the target data table based on the data query statement, the first query result, and the primary key field of the target data table to obtain the data query result corresponding to the data query request. Since the primary key field of a data table is usually an auto-incrementing field, meaning that the primary key field of a data table usually does not contain business information, the server can create a secondary index table corresponding to the target data table through the non-primary key field of the target data table, and perform data queries in combination with the secondary index table. This fully utilizes the index acceleration feature of the primary key field of the data table, avoids a full table scan of the data table, and improves data query efficiency.

[0183] Example 4

[0184] Following the same line of thought, embodiments of this specification also provide a data processing device, such as... Figure 7 As shown.

[0185] Data processing devices can vary considerably depending on configuration or performance, and may include one or more processors 701 and memory 702. Memory 702 may store one or more application programs or data. Memory 702 may be temporary or persistent storage. The application programs stored in memory 702 may include one or more modules (not shown), each module including a series of computer-executable instructions for the data processing device. Furthermore, processor 701 may be configured to communicate with memory 702 and execute the series of computer-executable instructions stored in memory 702 on the data processing device. The data processing device may also include one or more power supplies 703, one or more wired or wireless network interfaces 704, one or more input / output interfaces 705, and one or more keyboards 706.

[0186] Specifically, in this embodiment, the data processing device includes a memory and one or more programs, wherein one or more programs are stored in the memory, and one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the data processing device, and is configured to be executed by one or more processors. The one or more programs include computer-executable instructions for performing the following:

[0187] Receive a data query request for a target data table, and perform field extraction processing on the data query statement carried in the data query request to obtain the target field corresponding to the data query request;

[0188] If, based on the data query statement, it is determined that a query cannot be performed using the primary key field of the target data table, the non-primary key fields of the target data table contained in the target field are obtained.

[0189] Obtain the target index table in the secondary index table corresponding to the target data table, where the primary key field of the secondary index table is the non-primary key field in the target data table;

[0190] Based on the data query statement and the primary key field of the target index table, the target index table is queried to obtain a first query result. Then, based on the data query statement, the first query result, and the primary key field of the target data table, the target data table is queried to obtain a data query result corresponding to the data query request.

[0191] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the data processing device embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0192] This specification provides a data processing device that receives a data query request for a target data table, extracts fields from the data query statement carried in the data query request to obtain the target field corresponding to the data query request, and, if it is determined based on the data query statement that a query cannot be performed based on the primary key field of the target data table, obtains the non-primary key field of the target data table contained in the target field, obtains the target index table corresponding to the non-primary key field of the target data table from the secondary index table corresponding to the target data table, wherein the primary key field of the secondary index table is a non-primary key field of the target data table, performs query processing on the target index table based on the data query statement and the primary key field of the target index table to obtain a first query result, and performs query processing on the target data table based on the data query statement, the first query result, and the primary key field of the target data table to obtain the data query result corresponding to the data query request. Since the primary key field of a data table is usually an auto-incrementing field, meaning that the primary key field of a data table usually does not contain business information, the server can create a secondary index table corresponding to the target data table through the non-primary key field of the target data table, and perform data queries in combination with the secondary index table. This fully utilizes the index acceleration feature of the primary key field of the data table, avoids a full table scan of the data table, and improves data query efficiency.

[0193] Example 5

[0194] This specification also provides a computer-readable storage medium storing a computer program. When executed by a processor, this computer program implements the various processes of the above-described data processing method embodiments and achieves the same technical effects. To avoid repetition, it will not be described again here. The computer-readable storage medium may include, for example, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.

[0195] This specification provides a computer-readable storage medium that receives a data query request for a target data table, extracts fields from the data query statement carried in the data query request to obtain the target field corresponding to the data query request, and, if it is determined based on the data query statement that a query cannot be performed based on the primary key field of the target data table, obtains the non-primary key field of the target data table contained in the target field, obtains the target index table corresponding to the non-primary key field of the target data table from the secondary index table, wherein the primary key field of the secondary index table is a non-primary key field of the target data table, performs query processing on the target index table based on the data query statement and the primary key field of the target index table to obtain a first query result, and performs query processing on the target data table based on the data query statement, the first query result, and the primary key field of the target data table to obtain the data query result corresponding to the data query request. Since the primary key field of a data table is usually an auto-incrementing field, meaning that the primary key field of a data table usually does not contain business information, the server can create a secondary index table corresponding to the target data table through the non-primary key field of the target data table, and perform data queries in combination with the secondary index table. This fully utilizes the index acceleration feature of the primary key field of the data table, avoids a full table scan of the data table, and improves data query efficiency.

[0196] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

[0197] In the 1990s, improvements to a technology could be clearly distinguished as either hardware improvements (e.g., improvements to the circuit structure of diodes, transistors, switches, etc.) or software improvements (improvements to the methodology). However, with technological advancements, many methodological improvements today can be considered direct improvements to the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved methodology into the hardware circuit. Therefore, it cannot be said that a methodological improvement cannot be implemented using a hardware physical module. For example, a Programmable Logic Device (PLD) (e.g., a Field Programmable Gate Array (FPGA)) is such an integrated circuit whose logic function is determined by the user programming the device. Designers can program a digital system themselves to "integrate" it onto a PLD, without needing chip manufacturers to design and manufacture dedicated integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing integrated circuit chips, this programming is mostly implemented using "logic compiler" software. Similar to the software compiler used in program development, the original code before compilation must be written in a specific programming language, called a Hardware Description Language (HDL). There are many HDLs, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, and RHDL (Ruby Hardware Description Language). Currently, the most commonly used are VHDL (Very-High-Speed ​​Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should understand that by simply performing some logic programming on the method flow using one of these hardware description languages ​​and programming it into an integrated circuit, the hardware circuit implementing the logical method flow can be easily obtained.

[0198] The controller can be implemented in any suitable manner. For example, it can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320. A memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art will also recognize that, in addition to implementing the controller in purely computer-readable program code form, the same functionality can be achieved by logically programming the method steps to make the controller take the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, such a controller can be considered a hardware component, and the means included therein for implementing various functions can also be considered as structures within the hardware component. Alternatively, the means for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.

[0199] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.

[0200] For ease of description, the above apparatus is described by dividing it into various functional units. Of course, when implementing one or more embodiments of this specification, the functions of each unit can be implemented in one or more software and / or hardware.

[0201] Those skilled in the art will understand that the embodiments of this specification can be provided as methods, systems, or computer program products. Therefore, one or more embodiments of this specification may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of this specification may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0202] The embodiments described herein are illustrated with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this specification. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0203] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0204] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0205] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0206] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0207] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0208] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0209] Those skilled in the art will understand that the embodiments of this specification can be provided as methods, systems, or computer program products. Therefore, one or more embodiments of this specification may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of this specification may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0210] One or more embodiments of this specification can be described in the general context of computer-executable instructions, such as program modules, that are executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform a particular task or implement a particular abstract data type. One or more embodiments of this specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0211] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.

[0212] The above description is merely an embodiment of this specification and is not intended to limit this specification. Various modifications and variations can be made to this specification by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this specification should be included within the scope of the claims of this specification.

Claims

1. A data processing method, comprising: Receive a data query request for a target data table, and perform field extraction processing on the data query statement carried in the data query request to obtain the target field corresponding to the data query request; If, based on the data query statement, it is determined that a query cannot be performed using the primary key field of the target data table, the non-primary key fields of the target data table contained in the target field are obtained. Obtain the target index table in the secondary index table corresponding to the target data table, where the primary key field of the secondary index table is the non-primary key field in the target data table; Based on the data query statement and the primary key field of the target index table, the target index table is queried to obtain a first query result. Based on the data query statement, the first query result and the primary key field of the target data table, the target data table is queried to obtain a data query result corresponding to the data query request. The first query result is obtained by data query processing using a query method determined based on the relationship between the number of items and a preset number of items threshold. The query method includes subqueries and related queries. The number of items is the number of data items in the target index table corresponding to the data query statement, determined based on the data query statement and the primary key field in the secondary index table.

2. The method according to claim 1, wherein the step of performing query processing on the target index table based on the data query statement and the primary key field of the target index table to obtain a first query result, and performing query processing on the target data table based on the data query statement, the first query result, and the primary key field of the target data table to obtain a data query result corresponding to the data query request, includes: Based on the data query statement and the primary key field in the secondary index table, determine the number of data items in the target index table corresponding to the data query statement; When the number of items is not greater than a preset number of items threshold, the target index table is queried using the subquery method based on the data query statement and the primary key field of the target index table to obtain the first query result. Then, the target data table is queried based on the data query statement, the first query result and the primary key field of the target data table to obtain the data query result corresponding to the data query request. When the number of items exceeds a preset threshold, the target index table is queried using the associated query method based on the data query statement and the primary key field of the target index table to obtain the first query result. Then, the target data table is queried based on the data query statement, the first query result, and the primary key field of the target data table to obtain the data query result corresponding to the data query request.

3. The method according to claim 2, wherein querying the target index table using a subquery based on the data query statement and the primary key field of the target index table to obtain the first query result, and querying the target data table based on the data query statement, the first query result, and the primary key field of the target data table to obtain the data query result corresponding to the data query request, includes: Based on the data query statement, the primary key field in the target index table, and the primary key field in the target data table, the data query statement is updated to obtain a first data query statement. The first data query statement includes a first sub-data query statement for querying based on the primary key field in the target index table to obtain the first query result, and a second sub-query statement for querying the target data table based on the first query result and the primary key field in the target data table to obtain the data query result corresponding to the data query request. Execute the first data query statement to obtain the data query result corresponding to the data query request.

4. The method according to claim 2, wherein the step of querying the target index table based on the data query statement and the primary key field of the target index table through a relational query to obtain the first query result, and performing query processing on the target data table based on the data query statement, the first query result, and the primary key field of the target data table to obtain the data query result corresponding to the data query request, includes: Based on the data query statement, the primary key field in the target index table, and the primary key field in the target data table, the data query statement is updated to obtain a second data query statement; The second data query statement is executed, and the target index table is queried based on the primary key field of the target index table to obtain the first query result. Based on the first query result and the primary key field of the target data table, the data table obtained by associating the target index table and the target data table is queried to obtain the data query result corresponding to the data query request.

5. The method according to claim 1, before obtaining the target index table corresponding to the non-primary key field in the target data table from the secondary index table corresponding to the target data table, further comprising: Obtain the non-primary key fields from the target data table; Based on the business processing requirements corresponding to the target data table, the non-primary key fields in the target data table are filtered to obtain the target non-primary key fields; The target non-primary key field is determined as the primary key field of the secondary index table, and the primary key field of the target data table is determined as the non-primary key field of the secondary index table, thus obtaining the secondary index table corresponding to the target data table.

6. The method according to claim 5, wherein the step of performing field extraction processing on the data query statement carried in the data query request to obtain the target field corresponding to the data query request includes: Obtain the filter condition expression from the data query statement; Among the preset relational operators in the filtering condition expression, the preset relational operator with a field on one side and a non-field on the other side is determined as the target operator; The field corresponding to the target operator is determined as the target field corresponding to the data query request.

7. The method according to claim 6, wherein the target field includes multiple non-primary key fields in the target data table, and the step of obtaining the target index table corresponding to the non-primary key fields in the target data table from the secondary index table of the target data table includes: Based on the leftmost prefix matching rule, the non-primary key fields in the multiple target data tables are combined to obtain multiple field combinations, and the field combinations contain one or more non-primary key fields in the target data tables; The field combination and the primary key field of the secondary index table are matched to obtain a matching result. Based on the matching result, the target index table corresponding to the non-primary key field of the target data table is determined in the secondary index table corresponding to the target data table.

8. The method according to claim 1, further comprising, before obtaining the non-primary key fields in the target data table included in the target field when it is determined, based on the data query statement, that a query cannot be performed using the primary key field in the target data table: If the target field does not contain the primary key field of the target data table, it is determined that a query cannot be performed based on the primary key field of the target data table. or If the filter condition expression corresponding to the primary key field in the target data table in the data query statement is a preset condition expression, it is determined that a query cannot be performed based on the primary key field in the target data table. The preset condition expression includes, but is not limited to, fuzzy search expressions and not-equal-to-search expressions.

9. A data processing apparatus, comprising: The request receiving module is used to receive data query requests for a target data table, and to perform field extraction processing on the data query statements carried in the data query request to obtain the target fields corresponding to the data query request. The first acquisition module is used to acquire, when it is determined that a query cannot be performed based on the primary key field of the target data table, the non-primary key fields in the target data table contained in the target field. The second acquisition module is used to acquire the target index table in the secondary index table corresponding to the target data table and the target index table corresponding to the non-primary key field in the target data table, wherein the primary key field of the secondary index table is the non-primary key field in the target data table; The result determination module is used to perform query processing on the target index table based on the data query statement and the primary key field of the target index table to obtain a first query result, and to perform query processing on the target data table based on the data query statement, the first query result and the primary key field of the target data table to obtain a data query result corresponding to the data query request. The first query result is obtained by data query processing using a query method determined based on the relationship between the number of items and a preset number of items threshold. The query method includes subqueries and related queries. The number of items is the number of data items in the target index table corresponding to the data query statement, determined based on the data query statement and the primary key field in the secondary index table.

10. A data processing apparatus, the data processing apparatus comprising: processor; as well as A memory configured to store computer-executable instructions, which, when executed, cause the processor to: Receive a data query request for a target data table, and perform field extraction processing on the data query statement carried in the data query request to obtain the target field corresponding to the data query request; If, based on the data query statement, it is determined that a query cannot be performed using the primary key field of the target data table, the non-primary key fields of the target data table contained in the target field are obtained. Obtain the target index table in the secondary index table corresponding to the target data table, where the primary key field of the secondary index table is the non-primary key field in the target data table; Based on the data query statement and the primary key field of the target index table, the target index table is queried to obtain a first query result. Based on the data query statement, the first query result and the primary key field of the target data table, the target data table is queried to obtain a data query result corresponding to the data query request. The first query result is obtained by data query processing using a query method determined based on the relationship between the number of items and a preset number of items threshold. The query method includes subqueries and related queries. The number of items is the number of data items in the target index table corresponding to the data query statement, determined based on the data query statement and the primary key field in the secondary index table.