A pre-filtering condition determination method, device and equipment

CN115757478BActive Publication Date: 2026-06-26BEIJING YOUZHUJU NETWORK TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING YOUZHUJU NETWORK TECH CO LTD
Filing Date
2022-11-09
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In existing technologies, selecting pre-filtering conditions through fixed encoding rules or metadata information results in poor query efficiency and cannot effectively improve query speed.

Method used

Based on multiple query conditions provided by the user, various query condition groups are determined, including pre-filtered conditions and non-pre-filtered conditions. The total data read volume of each condition group is evaluated using statistical methods, and the query condition group whose total data read volume meets the preset conditions is selected as the target pre-filtered condition.

Benefits of technology

By reducing the total amount of data that needs to be read for a query, query speed and efficiency are significantly improved, and query response time is reduced.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115757478B_ABST
    Figure CN115757478B_ABST
Patent Text Reader

Abstract

The application discloses a pre-filtering condition determination method, device and equipment, and acquires a plurality of query conditions provided by a user. A plurality of query condition groups are determined according to the plurality of query conditions provided by the user, each query condition group comprising a pre-filtering condition and a non-pre-filtering condition. The execution order of the query condition groups is to execute the pre-filtering condition first and then execute the non-pre-filtering condition. Then, for each query condition group, the total data read amount required for executing the pre-filtering condition and the non-pre-filtering condition in the query condition group is acquired. Since the query speed is directly related to the total data read amount required for executing the query condition, the total data read amount is taken as an evaluation index of the query condition group, and the query condition group whose total data read amount satisfies a preset condition is a query condition group with relatively optimal query speed. The pre-filtering condition in the query condition group is a target pre-filtering condition. In this way, the total data read amount required for the query can be reduced, and the query speed can be improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing technology, specifically to a method, apparatus, and equipment for determining pre-filtering conditions. Background Technology

[0002] With the rapid development of computers, some databases not only have data storage functions, but can also execute user query and analysis commands. For example, after receiving a user's query command, the system will perform operations such as command reading, query condition retrieval, and data filtering, and finally return the query results to the user.

[0003] Typically, when users perform data queries, columnar storage-based databases employ pre-filtering techniques to improve query speed. Specifically, pre-filtering conditions are determined from multiple query conditions provided by the user. These pre-filtering conditions are used to filter a portion of the data before the remaining query conditions are applied to the retained data, thereby improving query speed.

[0004] In pre-filtering techniques, the selection of pre-filtering conditions directly impacts the overall query efficiency, with some optimized pre-filtering conditions leading to higher efficiency. Currently, pre-filtering conditions can be determined from query criteria using experience-guided fixed encoding rules or metadata selection rules. However, pre-filtering conditions determined by these rules generally result in poor overall query efficiency. Summary of the Invention

[0005] In view of this, this application provides a method, apparatus, and device for determining pre-filtering conditions, which can improve query efficiency.

[0006] To solve the above problems, the technical solution provided in this application is as follows:

[0007] In a first aspect, this application provides a method for determining pre-filtering conditions, the method comprising:

[0008] Retrieve multiple query conditions provided by the user;

[0009] Multiple query condition groups are determined based on the multiple query conditions; each query condition group includes pre-filtered conditions and non-pre-filtered conditions, and the execution order of the query condition groups is to execute the pre-filtered conditions and then execute the non-pre-filtered conditions.

[0010] Determine the total amount of data to be read required to execute the pre-filtered conditions and the non-pre-filtered conditions in the query condition group;

[0011] The pre-filtering conditions in the query condition group where the total data read volume meets the preset conditions are determined as the target pre-filtering conditions.

[0012] Secondly, this application provides a pre-filtering condition determination device, the device comprising:

[0013] The retrieval unit is used to retrieve multiple query conditions provided by the user.

[0014] The first determining unit is configured to determine multiple query condition groups based on the multiple query conditions; each query condition group includes pre-filtered conditions and non-pre-filtered conditions, and the execution order of the query condition groups is to execute the pre-filtered conditions and then execute the non-pre-filtered conditions;

[0015] The second determining unit is used to determine the total amount of data read required to execute the pre-filtering conditions and the non-pre-filtering conditions in the query condition group;

[0016] The third determining unit is used to determine the pre-filtering conditions in the query condition group where the total data read volume meets the preset conditions as the target pre-filtering conditions.

[0017] Thirdly, this application provides an electronic device, comprising:

[0018] One or more processors;

[0019] Storage device, on which one or more programs are stored,

[0020] When the one or more programs are executed by the one or more processors, the one or more processors implement any of the pre-filtering condition determination methods described above.

[0021] Fourthly, this application provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements any of the aforementioned pre-filtering condition determination methods.

[0022] Therefore, this application has the following beneficial effects:

[0023] This application provides a method, apparatus, and device for determining pre-filtering conditions. First, multiple query conditions provided by the user are obtained. Based on these conditions, various query condition groups are determined, each group including pre-filtering conditions and non-pre-filtering conditions. Both pre-filtering and non-pre-filtering conditions are query conditions. The execution order of the query condition groups is: pre-filtering conditions are executed first, followed by non-pre-filtering conditions. Then, for each query condition group, the total amount of data read required to execute the pre-filtering and non-pre-filtering conditions is obtained. Since query speed is directly related to the total amount of data read to execute the query conditions, to improve query speed, the total amount of data read is used as an evaluation index for the query condition group. Query condition groups whose total data read meets preset conditions are considered to have superior query speed. The pre-filtering conditions in this query condition group are the target pre-filtering conditions. This reduces the total amount of data read required for the query and improves query speed. Attached Figure Description

[0024] Figure 1 A schematic diagram illustrating an exemplary application scenario provided in this application embodiment;

[0025] Figure 2 A flowchart illustrating a method for determining pre-filtering conditions provided in an embodiment of this application;

[0026] Figure 3 A schematic diagram illustrating a pre-filtering condition determination method provided in an embodiment of this application;

[0027] Figure 4 This is a schematic diagram of a pre-filtering condition determination device provided in an embodiment of this application;

[0028] Figure 5 This is a schematic diagram of the basic structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0029] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, the embodiments of this application will be further described in detail below with reference to the accompanying drawings and specific implementation methods.

[0030] To facilitate understanding and explanation of the technical solutions provided in the embodiments of this application, the background technology of this application will be described first.

[0031] With the rapid development of computers, some databases not only have data storage functions, but can also execute user query and analysis commands. For example, after receiving a user's query command, the system will perform operations such as command reading, query condition retrieval, and data filtering, and finally return the query results to the user.

[0032] Currently, columnar storage technology can be used to store data. Columnar storage is a storage method that physically stores data in the form of columns. Based on columnar storage, queries only read the columns that are relevant, allowing access to one column of data without considering other columns. Typically, when users perform data queries, databases based on columnar storage use pre-filtering techniques to improve query speed.

[0033] Pre-filtering can be understood as a type of query rewriting technique. Query rewriting involves rewriting the user-provided query conditions instead of executing them directly, generating higher-performance query conditions. Queries based on these rewritten conditions yield more accurate results. Specifically, pre-filtering involves determining pre-filtering conditions from multiple user-provided query conditions, filtering a portion of the data using these pre-filtering conditions, and then executing the remaining conditions from the retained data, thereby improving query speed. In essence, performing pre-filtering can be seen as a form of query condition rewriting.

[0034] In pre-filtering techniques, the selection of pre-filtering conditions directly affects the query efficiency of the entire query process; some optimized pre-filtering conditions result in higher query efficiency. Currently, pre-filtering conditions can be determined from query criteria using experience-guided fixed encoding rules or metadata information selection rules.

[0035] The fixed encoding rules are implemented through built-in code, and the selection rules for pre-filtering conditions are fixed and determined based on experience. Furthermore, the process of determining the pre-filtering conditions is independent of the distribution of database data and actual conditions. Specifically, after receiving a user's query request, the columnar storage database obtains the query conditions provided by the user and selects conditions that conform to the fixed encoding rules as pre-filtering conditions.

[0036] Furthermore, metadata selection rules refer to the selection of pre-filtering conditions based on the metadata information of the data, such as the size of the data columns or the creation time. Specifically, after receiving a user's query request, the columnar storage database obtains the query conditions provided by the user and selects conditions that meet the metadata selection rules as pre-filtering conditions.

[0037] However, the pre-filtering conditions determined by fixed encoding rules or metadata information selection rules result in low query efficiency and poor query results throughout the query process.

[0038] Based on this, embodiments of this application provide a method, apparatus, and device for determining pre-filtering conditions. First, multiple query conditions provided by the user are obtained. Based on these conditions, various query condition groups are determined, each including pre-filtering conditions and non-pre-filtering conditions. Both pre-filtering and non-pre-filtering conditions are query conditions. The execution order of the query condition groups is: pre-filtering conditions are executed first, followed by non-pre-filtering conditions. Furthermore, for each query condition group, the total amount of data read required to execute the pre-filtering and non-pre-filtering conditions is obtained. Since query speed is directly related to the total amount of data read required to execute the query conditions, to improve query speed, the total amount of data read is used as an evaluation index for the query condition group. Query condition groups whose total data read meets preset conditions are identified as having superior query speed. The pre-filtering conditions in this query condition group are the target pre-filtering conditions. This reduces the total amount of data read required for the query and improves query speed.

[0039] To facilitate understanding of the pre-filtering condition determination method provided in the embodiments of this application, the following is combined with... Figure 1 The example scenario is illustrated below. As an optional example, this pre-filtering condition determination method can be applied to columnar storage databases. See also... Figure 1 As shown in the figure, this figure is a schematic diagram of an exemplary application scenario provided in the embodiments of this application.

[0040] In practical applications, users enter multiple query conditions on the front-end page of the columnar storage database and send a query request to the columnar storage database. The columnar storage database responds to the query request and retrieves the multiple query conditions provided by the user.

[0041] Furthermore, the columnar storage database determines multiple query condition groups based on various query conditions, such as query condition group 1, query condition group 2, ..., query condition group n. Query condition group 1 includes pre-filtered condition 1 and non-pre-filtered condition 1; query condition group 2 includes pre-filtered condition 2 and non-pre-filtered condition 2; ..., query condition group n includes pre-filtered condition n and non-pre-filtered condition n. It can be understood that each query condition group includes both pre-filtered and non-pre-filtered conditions, and both pre-filtered and non-pre-filtered conditions are query conditions. The execution order of the query condition groups is: pre-filtered conditions are executed first, followed by non-pre-filtered conditions.

[0042] Furthermore, the columnar storage database determines the total amount of data read required to execute the pre-filtered and non-pre-filtered conditions in each query condition group, and selects the query condition group from the n query condition groups whose total data read meets preset conditions. This query condition group is the one with the best query speed, and the pre-filtered conditions in this query condition group are determined as the target pre-filtered conditions.

[0043] Those skilled in the art will understand that Figure 1 The schematic diagram shown is merely one example in which embodiments of this application can be implemented. The scope of application of the embodiments of this application is not limited by any aspect of this framework.

[0044] To facilitate understanding of this application, the following description, in conjunction with the accompanying drawings, illustrates a method for determining pre-filtering conditions according to an embodiment of this application.

[0045] See Figure 2 As shown, Figure 2 This is a flowchart illustrating a pre-filtering condition determination method provided in an embodiment of this application. As an optional example, this pre-filtering condition determination method can be applied to a columnar storage database. Figure 2 As shown, the method may include S201-S204:

[0046] S201: Obtain multiple query conditions provided by the user.

[0047] Typically, users can enter multiple query criteria on the front-end page of a columnar storage database and send a query request to the database. The columnar storage database responds to the query request by retrieving the multiple query criteria provided by the user.

[0048] For example, a database stores data in columns A, B, and C using a columnar storage method. Column A stores the specific values ​​corresponding to A, column B stores the specific values ​​corresponding to B, and column C stores the specific values ​​corresponding to C. Based on this, the user provides three query conditions: A=1, B=2, and C=3. After obtaining these three query conditions, the columnar storage database needs to determine the optimal pre-filtering conditions.

[0049] It should be noted that in this embodiment, one query condition corresponds to only one column of data. For example, to obtain A=1, it is only necessary to query from column A. The case where one query condition corresponds to multiple columns of data is not considered.

[0050] S202: Determine multiple query condition groups based on multiple query conditions; each query condition group includes pre-filtered conditions and non-pre-filtered conditions, and the execution order of the query condition groups is to execute the pre-filtered conditions first, and then execute the non-pre-filtered conditions.

[0051] After obtaining multiple query conditions, a columnar storage database determines various groups of query conditions based on these conditions. The number of query condition groups is not limited; the goal is to cover as many possible scenarios as possible.

[0052] Each query condition group includes pre-filtered conditions and non-pre-filtered conditions. In practice, pre-filtered conditions are first determined from multiple query conditions, and the remaining query conditions are the non-pre-filtered conditions. The execution order of the query condition groups is: pre-filtered conditions are executed first, followed by non-pre-filtered conditions.

[0053] Furthermore, the number of pre-filtering conditions is one. For example, a columnar storage database obtains three query conditions: A=1, B=2, and C=3. Then, using A=1 as the pre-filtering condition and B=2 and C=3 as non-pre-filtering conditions, we obtain the first set of query conditions. Using B=2 as the pre-filtering condition and A=1 and C=3 as non-pre-filtering conditions, we obtain the second set of query conditions. Using C=3 as the pre-filtering condition and A=1 and B=2 as non-pre-filtering conditions, we obtain the third set of query conditions.

[0054] S203: Determine the total amount of data to be read for executing the pre-filtered and non-pre-filtered conditions in the query condition group.

[0055] Since query speed is directly related to the total amount of data read to execute query conditions, the total amount of data read is used as an evaluation metric for query condition groups to improve query speed. After identifying multiple query condition groups, the total amount of data read required to execute each query condition, including both pre-filtered and non-pre-filtered conditions, is obtained.

[0056] For example, retrieve the total data reads for the first query condition group with A=1 as the pre-filter condition and B=2 and C=3 as non-pre-filter conditions. Retrieve the total data reads for the second query condition group with B=2 as the pre-filter condition and A=1 and C=3 as non-pre-filter conditions. Retrieve the total data reads for the third query condition group with C=3 as the pre-filter condition and A=1 and B=2 as non-pre-filter conditions.

[0057] In one possible implementation, this application provides a specific implementation method for determining the total amount of data read required for pre-filtering conditions and non-pre-filtering conditions in the query condition group in S203, as detailed in A1-A5 below.

[0058] S204: Determine the pre-filtering conditions in the query condition group where the total data read volume meets the preset conditions as the target pre-filtering conditions.

[0059] As an optional example, the default condition is to minimize the total amount of data read.

[0060] Furthermore, when obtaining the total data read volume corresponding to each query condition group, the total data read volume required to execute multiple query conditions without using pre-filtering technology is also obtained. The ratio of the total data read volume corresponding to each query condition group to the total data read volume required to execute multiple query conditions simultaneously is calculated. Based on this, as another optional example, the preset condition is that the ratio of the total data read volume to the total data read volume required to execute multiple query conditions simultaneously is minimized.

[0061] It is understandable that a set of query conditions where the total amount of data read meets preset criteria is considered the optimal query condition set for query speed. The pre-filtering conditions within this set of query conditions are the target pre-filtering conditions. Therefore, using a set of query conditions where the total amount of data read meets preset criteria for data querying can reduce the total amount of data required for the query and improve query speed.

[0062] Based on the content of S201-S204 above, it can be seen that determining the pre-filtering conditions based on the statistics of the total data read volume directly uses the total data read volume as the evaluation index. The query condition group whose total data read volume meets the preset conditions is the query condition group with better query speed. Executing the target pre-filtering conditions in this query condition group can significantly improve query speed and reduce query response time.

[0063] In one possible implementation, this application embodiment provides a specific implementation method for determining the total amount of data read required for executing the pre-filtered conditions and non-pre-filtered conditions in the query condition group in step S203, including:

[0064] A1: Obtain the amount of first data to be read for executing pre-filtering conditions; the first data is stored in multiple data read / write units; the first data includes the data corresponding to the pre-filtering conditions.

[0065] The first data to be read for the pre-filtering condition is the column data corresponding to the pre-filtering condition. The first data is stored in multiple data read / write units, meaning the column data corresponding to the pre-filtering condition is stored in multiple data read / write units.

[0066] As an optional example, based on the metadata information of the column containing the first data (i.e., the column data corresponding to the pre-filtering condition), the amount of first data to be read for executing the pre-filtering condition is obtained. For example, the space occupied by the compressed data read / write unit. Therefore, the amount of first data is the total amount of data to be read when reading multiple data read / write units, based on the multiple data read / write units and the space occupied by each data read / write unit after compression. In addition, the column's metadata information also includes information such as the space occupied by the column's data type, which will not be detailed here.

[0067] For example, taking the second type of query condition group as an example, in the second type of query condition group, B=2 is a pre-filtering condition, and A=1 and C=3 are non-pre-filtering conditions. Then the first data (i.e., the column data corresponding to B=2) is the data in column B. The data in column B is stored in multiple data read / write units, and the amount of data in column B that needs to be read to execute B=2 is obtained. The amount of data in column B is the amount of data in the first data. It can be understood that the data corresponding to the pre-filtering condition is B=2, and the data in column B includes B=2.

[0068] To maximize the performance of sequential disk read / write operations, some columnar storage databases use a minimum unit for data read / write. For example, in the columnar storage database ClickHouse, this minimum unit is called a Block. ClickHouse performs both read and write operations on the disk in Block form. That is, in a query scenario, a columnar storage database cannot directly read a single record stored on the disk; instead, it can only read a Block and then retrieve the data related to the query conditions from the Block through traversal and searching.

[0069] As an optional example, the data read / write unit is the smallest unit for reading and writing data in the database. Based on the above, the data read / write unit can be a Block. Therefore, the first data (i.e., the column data corresponding to the pre-filtering condition) corresponds to multiple Blocks during storage. The amount of first data to be read to execute the pre-filtering condition is determined by the number of Blocks corresponding to the column containing the first data and the space occupied by each Block after compression.

[0070] It is understandable that the amount of data to be read for the first time to execute the pre-filtering condition is only a portion of the total data read, and the amount of data required to execute the non-pre-filtering condition also needs to be obtained.

[0071] A2: Obtain the sparse probability corresponding to the pre-filtering condition; the sparse probability is used to represent the proportion of the data read / write unit storing the data corresponding to the pre-filtering condition among the multiple data read / write units storing the first data.

[0072] It is evident that by using pre-filtering technology—executing pre-filtering conditions first and then non-pre-filtering conditions—the amount of data that needs to be read during a query can be minimized. Essentially, this is achieved by reducing the number of data read / write units (such as the number of blocks) during a query, thereby reducing the total amount of data read.

[0073] Taking a block as the data read / write unit, if the data corresponding to a query condition is densely distributed at the block level of the corresponding column data, with most blocks storing the data corresponding to the query condition, then even if pre-filtering is performed using this query condition, not many blocks can be filtered out, making this query condition unsuitable as a pre-filtering condition. Conversely, if the data corresponding to a query condition is sparsely distributed at the block level of the corresponding column data, with only a few blocks storing the data corresponding to the query condition, then using this query condition as a pre-filtering condition can filter out most blocks, making this query condition more suitable as a pre-filtering condition. Therefore, the sparser the query condition is at the block level of the corresponding column data, the more suitable it is as a pre-filtering condition.

[0074] Based on the above, from the perspective of the sparsity corresponding to the pre-filtering conditions, a method for obtaining the total number of data reads can be derived. Specifically, the sparsity probability corresponding to the pre-filtering conditions is first obtained. The sparsity probability is used to represent the proportion of data read / write units storing the data corresponding to the pre-filtering conditions among the multiple data read / write units storing the first data.

[0075] In one possible implementation, this application provides a specific implementation method for obtaining the sparse probability corresponding to the pre-filtering conditions, as detailed in A201-A203 below.

[0076] A3: Retrieves the original data required to execute non-pre-filtered conditions.

[0077] Specifically, the original data required to execute a non-pre-filtered condition is the column data corresponding to the non-pre-filtered condition. For example, taking the second type of query condition group as an example, in the second type of query condition group, B=2 is the pre-filtered condition, and A=1 and C=3 are the non-pre-filtered conditions. The data required to execute a non-pre-filtered condition is the data in columns A and C.

[0078] It is understandable that the original data required to execute a query without pre-filtering is the same data that would be required to execute the query without pre-filtering. For example, when executing the query conditions A=1 and C=3 without pre-filtering, the data required to be read are the data in columns A and C.

[0079] In this embodiment, the original data volume corresponds to the original data. When the original data consists of column A and column C, the original data volume is the same as the data volume of column A and column C.

[0080] A4: Based on the sparse probability corresponding to the pre-filtering condition and the original data required to execute the non-pre-filtering condition, obtain the amount of second data required to execute the non-pre-filtering condition.

[0081] In one possible implementation, embodiments of this application provide a specific implementation method for obtaining the amount of second data to be read for executing non-pre-filtering conditions based on the sparse probability corresponding to the pre-filtering conditions and the original data to be read for executing non-pre-filtering conditions, including:

[0082] A401: Get the amount of raw data required to execute the non-pre-filtered conditions.

[0083] A402: Calculate the product of the sparse probability corresponding to the pre-filtering condition and the amount of original data to be read to execute the non-pre-filtering condition, and use the product as the amount of second data to be read to execute the non-pre-filtering condition.

[0084] The amount of original data to be read when executing the non-pre-filtered condition is the amount of data in the column corresponding to the non-pre-filtered condition. The product of the sparsity probability corresponding to the pre-filtered condition and the amount of original data to be read when executing the non-pre-filtered condition is taken as the amount of data to be read for the second data when executing the non-pre-filtered condition. For example, if the sparsity probability corresponding to the pre-filtered condition is 0.6, and the amount of original data to be read when executing A=1 and C=3 is (the amount of data in column A + the amount of data in column C), then the amount of data for the second data is 0.6 * (the amount of data in column A + the amount of data in column C).

[0085] In another possible implementation, this application provides a specific implementation method for obtaining the amount of second data to be read for executing the non-pre-filtering condition based on the sparse probability corresponding to the pre-filtering condition and the original data to be read for executing the non-pre-filtering condition, including:

[0086] A411: Determine the first filtering condition and the second filtering condition in the non-pre-filtering conditions.

[0087] For example, the first filtering condition is A=1, and the second filtering condition is C=3.

[0088] A412: The product of the sparse probability corresponding to the pre-filtering condition and the amount of original data to be read to execute the first filtering condition is taken as the first sub-data amount to be read to execute the first filtering condition.

[0089] The amount of original data required to execute the first filtering condition is the amount of data in the column corresponding to the first filtering condition. For example, if the sparsity probability corresponding to the pre-filtering condition is 0.6, and the column data corresponding to the first filtering condition A=1 is column A, then the amount of the first sub-data is 0.6 * the amount of data in column A.

[0090] A413: Determine the sparse probability of combining the pre-filtering condition with the first filter condition, and use the product of the sparse probability of combining and the amount of original data to be read when executing the second filter condition as the amount of second sub-data to be read when executing the second filter condition.

[0091] The amount of original data required to execute the second filtering condition is the amount of column data corresponding to the second filtering condition. The combination sparsity probability corresponding to the pre-filtering condition combined with the first filtering condition represents the proportion of data read / write units storing the data corresponding to the pre-filtering condition and the data corresponding to the first filtering condition among the multiple data read / write units required to store the first data and the column data corresponding to the first filtering condition. The column data corresponding to the first filtering condition includes the data corresponding to the first filtering condition.

[0092] For example, the column data corresponding to the pre-filter condition B=2 is column B. The column data corresponding to the first filter condition A=1 is column A, and the data corresponding to the first filter condition is A=1. The column data corresponding to the second filter condition C=3 is column C. Combining the sparsity probability with the proportion of data read / write units storing B=2 and A=1 to data read / write units storing column B and column A, if it is 0.3, then the second sub-data volume is 0.3 * the data volume of column C.

[0093] A414: The amount of second data to be read to perform non-pre-filtering conditions is the sum of the first and second sub-data amounts.

[0094] That is, the sum of the first sub-data volume and the second sub-data volume is the amount of second data that needs to be read to perform the non-pre-filtering condition.

[0095] A5: The sum of the data volume of the first data and the data volume of the second data is determined as the total amount of data to be read when executing the pre-filtered and non-pre-filtered conditions in the query condition group.

[0096] After obtaining the data volume of the first data and the data volume of the second data, the sum of the data volume of the first data and the data volume of the second data is determined as the total data read volume required to execute the pre-filtered conditions and non-pre-filtered conditions in the query condition group.

[0097] Based on the content of A1-A5, it can be seen that both A401-A402, which obtains the sparse probability corresponding to the pre-filtering conditions and uses this sparse probability to obtain the amount of second data to be read for executing non-pre-filtering conditions, and A411-A414, which obtains the sparse probability corresponding to the pre-filtering conditions and the combined sparse probability corresponding to the pre-filtering conditions and the first filtering conditions, and uses these sparse probabilities and combined sparse probabilities to obtain the amount of second data to be read for executing non-pre-filtering conditions, use statistical methods to statistically analyze the data at the data read / write unit level (such as the data corresponding to the pre-filtering conditions) to obtain the sparse probability and the combined sparse probability. Furthermore, based on the sparse probability (or the sparse probability and the combined sparse probability), the total amount of data to be read for executing each set of query conditions is calculated. Thus, by obtaining the total amount of data read through statistical methods, and using the total amount of data read as an evaluation indicator to determine the set of query conditions that meet the preset conditions, query speed can be improved and query response time reduced during actual queries.

[0098] In one possible implementation, this application provides a specific implementation method for obtaining the sparse probability corresponding to the pre-filtering condition in A2. This specific implementation method describes in detail the process of obtaining the sparse probability corresponding to the pre-filtering condition based on statistical methods, specifically including:

[0099] A201: Obtain the statistical data corresponding to the first data; the statistical data includes the target value and the target quantity corresponding to the target value; the target quantity is the number of data read / write units that store the target value in the multiple data read / write units corresponding to the first data; the target value is each value in the first data.

[0100] To describe the sparsity of each value in the first data at the data read / write unit level (e.g., the block level), a method of counting individually for each data read / write unit is introduced. Each value in the first data is described by a target value. Specifically, if a target value appears in a data read / write unit, the count of the target value in that data read / write unit is set to 1, regardless of how many times the target value appears in that data read / write unit. Based on this, the target quantity corresponding to the target value in the first data is obtained. Thus, the statistical data corresponding to the first data is obtained.

[0101] For example, when the pre-filter condition is B=2, the first data is the data in column B. The value of B in column B can take many forms, such as B=2, B=5, B=7, etc. Therefore, the statistical data includes B=2 and the target quantity of B=2 (i.e., the number of data read / write units storing B=2), B=5 and the target quantity of B=5, and B=7 and the target quantity of B=7, and so on.

[0102] As an optional example, the data structure for statistical data can be a histogram data structure or other data structures with merging, updating, partial querying, serialization, and deserialization capabilities. Taking a histogram data structure as an example, a histogram includes horizontal and vertical axes. The horizontal axis represents the target value, and the vertical axis represents the number of data read / write units storing the target value. Therefore, statistical data based on a histogram data structure facilitates querying the percentage of a specified value appearing in that column of data.

[0103] It is understandable that the statistical data corresponding to the first data point is a pre-selected statistical data.

[0104] A202: Based on the statistical data corresponding to the first data, determine the number of data read / write units corresponding to the data stored under the pre-filtering conditions.

[0105] For example, when the pre-filtering condition is B=2, the number of data read / write units storing the data corresponding to the pre-filtering condition can be obtained based on the statistical data corresponding to the first data.

[0106] A203: Calculate the ratio of the number of data read / write units corresponding to the data stored under the pre-filtering condition to the number of data read / write units storing the first data, and determine the ratio as the sparse probability corresponding to the pre-filtering condition.

[0107] After determining the number of data read / write units corresponding to the data stored under the pre-filtering conditions, the ratio of the number of data read / write units corresponding to the data stored under the pre-filtering conditions to the number of data read / write units storing the first data is calculated, and this ratio is then determined as the sparse probability corresponding to the pre-filtering conditions.

[0108] Based on the content of A201-A203, statistical methods are used to pre-analyze the target values ​​in the first set of data in the database, obtaining the corresponding statistical data. Therefore, when a user submits a query request, by determining the pre-filtering conditions in the query and the first set of data required to execute the pre-filtering conditions, the sparsity probability corresponding to the pre-filtering conditions can be obtained based on the statistical data of the first set of data. It is understandable that the statistical results generated from the data analysis can also provide richer functionalities for data analysis.

[0109] Since the statistical data corresponding to the first data is pre-selected, in one possible implementation, this application embodiment provides a specific implementation method for A201 to obtain the statistical data corresponding to the first data, including:

[0110] Generate statistical data corresponding to the database data and store the statistical data corresponding to the database data within a preset time period into the cache;

[0111] Retrieve the statistical data corresponding to the first data from the statistical data stored in the cache within a preset time period.

[0112] In practice, after generating the statistical data corresponding to the database data, the statistical data corresponding to the database data is written to the disk. This data can be accessed by other programs and code, and the distribution of the data can be quickly obtained.

[0113] Because the database data is stored in a columnar format, the statistical data corresponding to the database data consists of the statistical data corresponding to each column of data in the database. The first data is also column data. The database data includes the first data, and the statistical data corresponding to the database data within a preset time period includes the statistical data corresponding to the first data.

[0114] It is understandable that pre-generating statistical data corresponding to the database data and storing this data in a specific data structure (such as a histogram data structure) allows users to access the statistical data at any time when querying data.

[0115] Specifically, statistical data corresponding to database data within a preset time period can be stored in a cache. The statistical data corresponding to the first data can then be retrieved from the cached statistical data within the preset time period. Retrieving the statistical data corresponding to the first data through caching improves reading efficiency. The preset time period can be selected according to actual circumstances, and this embodiment does not limit its selection.

[0116] In another possible implementation, instead of storing the statistical data corresponding to the database in a cache, the statistical data for the first data is read directly from the disk when needed. However, this method of retrieving statistical data for the first data is relatively inefficient.

[0117] In another possible implementation, this application provides a specific method for generating statistical data corresponding to database data, including:

[0118] When data is written to the database, statistical data corresponding to the data written to the database is generated;

[0119] Alternatively, after the data is written to the database, the statistical data generation task can be placed in a queue, and the statistical data generation task in the queue can be executed at a preset time to reread the database data and generate the statistical data corresponding to the database data.

[0120] It should be noted that when generating statistical data for the database, either synchronous or asynchronous generation methods can be used.

[0121] In synchronous generation, statistical data corresponding to the data written to the database is generated simultaneously. Synchronous generation means that the data writing to the database and the generation of statistical data occur at the same time. During the data writing process, the written data is directly used to perform statistical analysis and generate statistical data for each column. This eliminates the need for additional data retrieval costs.

[0122] In asynchronous generation, after data is written to the database, statistical data generation tasks are placed in a queue. At a preset time, these tasks are executed, rereading the database data and generating the corresponding statistical data. Asynchronous generation means that the data writing to the database and the generation of statistical data are not synchronized. During the data writing process, data statistics and generation are not performed immediately. Instead, statistical data generation tasks are first placed in a queue, and then scheduled uniformly at the preset time when the statistical data is needed. The required data is reread, statistically analyzed, and then serialized to generate the statistical data. This minimizes the latency impact on data writing to the database.

[0123] In practical applications, depending on the actual situation, the asynchronous or synchronous generation writing mode can be selected for data writing to generate data statistics results, which can achieve a balance between data reading cost and writing latency.

[0124] See Figure 3 , Figure 3 This is a schematic diagram of a pre-filtering condition determination method provided in an embodiment of this application. Figure 3 This document describes the entire detailed implementation process for determining pre-filtering conditions from query conditions in a practical application.

[0125] like Figure 3 As shown, when data is written to the database, the method for generating the corresponding statistical data can be determined based on the actual situation, such as asynchronous or synchronous generation. Asynchronous generation requires creating a queue and generating a schedule when statistical data is needed, thereby rereading the database data and generating the corresponding statistical data. In synchronous generation, statistical data is directly generated based on the data written to the database.

[0126] Generate statistical data corresponding to the database and store this data on disk. In the ClickHouse database, although the smallest unit for reading and writing is a data block (Block), the smallest unit existing on disk is a Part, and each Part may consist of one or more Blocks. For example... Figure 3 As shown, the storage units for storing database data and the corresponding statistical data are Part1-Partn.

[0127] After generating statistical data corresponding to the database and storing the statistical data on the disk, the statistical data corresponding to the database data within a preset time period is periodically stored in the cache.

[0128] After a user submits a query request on the database front-end page, the database obtains multiple query conditions provided by the user and determines various query condition groups based on these conditions. Each query condition group includes pre-filtered conditions and non-pre-filtered conditions. At this point, based on the metadata information of the column containing the first data in Part (i.e., the column data corresponding to the pre-filtered conditions), the amount of first data to be read to execute the pre-filtered conditions is determined. Furthermore, statistical data corresponding to the first data is obtained from the statistical data stored in the cache within a preset time period. Based on the statistical data corresponding to the first data, the number of data read / write units for storing the data corresponding to the pre-filtered conditions is determined. The ratio of the number of data read / write units for storing the data corresponding to the pre-filtered conditions to the number of data read / write units for storing the first data is calculated, and this ratio is determined as the sparsity probability corresponding to the pre-filtered conditions.

[0129] Furthermore, the product of the sparsity probability corresponding to the pre-filtering condition and the amount of original data required to execute the non-pre-filtering condition is calculated, and this product is used as the amount of second data required to execute the non-pre-filtering condition. The amount of original data required to execute the non-pre-filtering condition is also obtained based on the original data information of the column containing the original data. Further, the sum of the amounts of the first and second data is determined as the total amount of data required to execute the pre-filtering and non-pre-filtering conditions in the query condition group.

[0130] Obtain the total amount of data read required to execute the pre-filtered and non-pre-filtered conditions in each query condition group. Identify the pre-filtered conditions in query condition groups whose total data read meets preset criteria as target pre-filtered conditions.

[0131] Finally, the target pre-filtering condition is executed for pre-filtering, and then the remaining query conditions in the condition group other than the target pre-filtering condition are queried to complete the query process.

[0132] In addition, statistical data needs to be merged when storing it. During the data merging process, asynchronous or synchronous generation writing modes can be selected to generate statistical results, depending on the actual situation. In practice, at a settable time interval, n parts (n can be set according to the actual situation) are selected from the most recent period, the statistical data in the n parts are deserialized, and then merged. The new statistical data after merging will replace the corresponding statistical data originally in the cache. If the statistical data structure is a histogram data structure, since the histogram data structure supports merging, the successfully generated statistical data can be directly used for merging.

[0133] The pre-filtering condition determination method provided in this application involves multiple processes such as writing, statistics, merging, caching, and querying. It performs statistical analysis on database data and stores the statistical data along with the database data. Before using the statistical data, it stores it in a cache, and reads the statistical data from the cache when needed. Based on the statistical data, it obtains the sparsity probability of the pre-filtering conditions, and then uses this sparsity probability to determine the total amount of data read required to execute the query condition group. Finally, based on the total amount of data read, it determines the query condition group that meets preset conditions, and the pre-filtering conditions in this query condition group are determined as the target pre-filtering conditions. In this way, by using statistical methods and with the total amount of data read as an evaluation indicator, it is possible to obtain better pre-filtering conditions, which results in faster query speeds.

[0134] Based on the pre-filtering condition determination method provided in the above-described method embodiments, this application also provides a pre-filtering condition determination device, which will be described below with reference to the accompanying drawings.

[0135] See Figure 4 As shown, this figure is a schematic diagram of the structure of a pre-filtering condition determination device provided in an embodiment of this application. Figure 4 As shown, the pre-filtration condition determination device includes:

[0136] The acquisition unit 401 is used to acquire multiple query conditions provided by the user.

[0137] The first determining unit 402 is configured to determine multiple query condition groups based on multiple query conditions; each query condition group includes pre-filtered conditions and non-pre-filtered conditions, and the execution order of the query condition groups is to execute the pre-filtered conditions first, and then execute the non-pre-filtered conditions.

[0138] The second determining unit 403 is used to determine the total amount of data read required to execute the pre-filtering conditions and the non-pre-filtering conditions in the query condition group;

[0139] The third determining unit 404 is used to determine the pre-filtering conditions in the query condition group where the total data read volume meets the preset conditions as the target pre-filtering conditions.

[0140] In one possible implementation, the second determining unit 403 includes:

[0141] The first acquisition subunit is used to acquire the amount of first data to be read for executing the pre-filtering conditions; the first data is stored in multiple data read / write units; the first data includes data corresponding to the pre-filtering conditions;

[0142] The second acquisition subunit is used to acquire the sparse probability corresponding to the pre-filtering condition; the sparse probability is used to represent the proportion of the data read / write unit storing the data corresponding to the pre-filtering condition among the multiple data read / write units storing the first data.

[0143] The third acquisition subunit is used to acquire the original data required to execute the non-pre-filtering conditions;

[0144] The fourth acquisition subunit is used to acquire the amount of second data to be read for executing the non-pre-filtering condition based on the sparse probability corresponding to the pre-filtering condition and the original data to be read for executing the non-pre-filtering condition.

[0145] The first determining subunit is used to determine the sum of the data volume of the first data and the data volume of the second data as the total data reading volume required to execute the pre-filtering conditions and non-pre-filtering conditions in the query condition group.

[0146] In one possible implementation, the fourth acquisition subunit includes:

[0147] The fifth acquisition subunit is used to acquire the amount of original data to be read for executing the non-pre-filtering conditions;

[0148] The first calculation subunit is used to calculate the product of the sparse probability corresponding to the pre-filtering condition and the original data volume, and use the product as the data volume of the second data to be read to execute the non-pre-filtering condition.

[0149] In one possible implementation, the second acquisition subunit includes:

[0150] The sixth acquisition subunit is used to acquire statistical data corresponding to the first data; the statistical data includes a target value and a target quantity corresponding to the target value; the target quantity is the number of data read / write units that store the target value in the plurality of data read / write units corresponding to the first data; the target value is each value in the first data;

[0151] The second determining subunit is used to determine the number of data read / write units storing the data corresponding to the pre-filtering conditions based on the statistical data corresponding to the first data.

[0152] The second calculation subunit is used to calculate the ratio of the number of data read / write units storing the data corresponding to the pre-filtering condition to the number of data read / write units storing the first data, and to determine the ratio as the sparsity probability corresponding to the pre-filtering condition.

[0153] In one possible implementation, the sixth acquisition subunit includes:

[0154] A generation subunit is used to generate statistical data corresponding to database data and store the statistical data corresponding to the database data within a preset time period into a cache.

[0155] The seventh acquisition subunit is used to acquire the statistical data corresponding to the first data from the statistical data stored in the cache within the preset time period.

[0156] In one possible implementation, the generating subunit is specifically used for:

[0157] When data is written to the database, statistical data corresponding to the database data written to the database is generated;

[0158] Alternatively, after the data is written to the database, the statistical data generation task is placed in a queue, and the statistical data generation task in the queue is executed at a preset time to reread the database data and generate the statistical data corresponding to the database data.

[0159] In one possible implementation, the data structure of the statistical data is a histogram data structure.

[0160] Based on the pre-filtering condition determination method provided in the above method embodiments, this application also provides an electronic device, including: one or more processors; a storage device storing one or more programs thereon, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the pre-filtering condition determination method described in any of the above embodiments.

[0161] The following is for reference. Figure 5The diagram illustrates a structural schematic of an electronic device 1300 suitable for implementing embodiments of this application. The terminal devices in these embodiments may include, but are not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Portable Android Devices), PMPs (Portable Media Players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs (televisions), desktop computers, etc. Figure 5 The electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0162] like Figure 5 As shown, electronic device 1300 may include a processing device (e.g., a central processing unit, a graphics processor, etc.) 1301, which can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 1302 or a program loaded from storage device 1308 into random access memory (RAM) 1303. RAM 1303 also stores various programs and data required for the operation of electronic device 1300. Processing device 1301, ROM 1302, and RAM 1303 are interconnected via bus 1304. Input / output (I / O) interface 1305 is also connected to bus 1304.

[0163] Typically, the following devices can be connected to I / O interface 1305: input devices 1306 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 1307 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 1308 including, for example, magnetic tapes, hard disks, etc.; and communication devices 1309. Communication device 1309 allows electronic device 1300 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 5 An electronic device 1300 with various devices is shown; however, it should be understood that it is not required to implement or possess all of the devices shown. More or fewer devices may be implemented or possessed alternatively.

[0164] Specifically, according to embodiments of this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this application include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication device 1309, or installed from storage device 1308, or installed from ROM 1302. When the computer program is executed by processing device 1301, it performs the functions defined in the methods of embodiments of this application.

[0165] The electronic device provided in this application embodiment and the pre-filtering condition determination method provided in the above embodiment belong to the same inventive concept. Technical details not described in detail in this embodiment can be found in the above embodiment, and this embodiment has the same beneficial effects as the above embodiment.

[0166] Based on the pre-filtering condition determination method provided in the above-described method embodiments, this application provides a computer-readable medium storing a computer program thereon, wherein the program, when executed by a processor, implements the pre-filtering condition determination method as described in any of the above embodiments.

[0167] It should be noted that the computer-readable medium described above in this application can be a computer-readable signal medium, a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this application, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.

[0168] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol such as HTTP (Hypertext Transfer Protocol) and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), the Internet (e.g., the Internet of Things), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future-developed networks.

[0169] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device.

[0170] The aforementioned computer-readable medium carries one or more programs, which, when executed by the electronic device, cause the electronic device to perform the aforementioned pre-filtering condition determination method.

[0171] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof. These programming languages ​​include, but are not limited to, object-oriented programming languages ​​such as Java, Smalltalk, and C++, as well as conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0172] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0173] The units described in the embodiments of this application can be implemented in software or in hardware. The name of the unit / module does not necessarily limit the unit itself; for example, a voice data acquisition module can also be described as a "data acquisition module".

[0174] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.

[0175] In the context of this application, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0176] According to one or more embodiments of this application, [Example 1] provides a method for determining pre-filtering conditions, the method comprising:

[0177] Retrieve multiple query conditions provided by the user;

[0178] Multiple query condition groups are determined based on the multiple query conditions; each query condition group includes pre-filtered conditions and non-pre-filtered conditions, and the execution order of the query condition groups is to execute the pre-filtered conditions and then execute the non-pre-filtered conditions.

[0179] Determine the total amount of data to be read required to execute the pre-filtered conditions and the non-pre-filtered conditions in the query condition group;

[0180] The pre-filtering conditions in the query condition group where the total data read volume meets the preset conditions are determined as the target pre-filtering conditions.

[0181] According to one or more embodiments of this application, [Example 2] provides a method for determining pre-filtering conditions, wherein determining the total amount of data read required to execute the pre-filtering conditions and non-pre-filtering conditions in the query condition group includes:

[0182] Obtain the amount of first data required to execute the pre-filtering conditions; the first data is stored in multiple data read / write units; the first data includes data corresponding to the pre-filtering conditions;

[0183] Obtain the sparse probability corresponding to the pre-filtering condition; the sparse probability is used to represent the proportion of the data read / write unit storing the data corresponding to the pre-filtering condition among the multiple data read / write units storing the first data;

[0184] Obtain the original data required to execute the non-pre-filtered conditions;

[0185] Based on the sparse probability corresponding to the pre-filtering condition and the original data required to execute the non-pre-filtering condition, obtain the amount of second data required to execute the non-pre-filtering condition;

[0186] The sum of the data volume of the first data and the data volume of the second data is determined as the total data read volume required to execute the pre-filtered and non-pre-filtered conditions in the query condition group.

[0187] According to one or more embodiments of this application, [Example 3] provides a method for determining pre-filtering conditions, wherein obtaining the amount of second data to be read for executing the non-pre-filtering conditions based on the sparsity probability corresponding to the pre-filtering conditions and the original data to be read for executing the non-pre-filtering conditions includes:

[0188] Obtain the amount of original data required to execute the non-pre-filtered conditions;

[0189] Calculate the product of the sparse probability corresponding to the pre-filtering condition and the original data volume, and use the product as the amount of second data to be read to execute the non-pre-filtering condition.

[0190] According to one or more embodiments of this application, [Example 4] provides a method for determining pre-filtering conditions, wherein obtaining the sparsity probability corresponding to the pre-filtering conditions includes:

[0191] Obtain statistical data corresponding to the first data; the statistical data includes a target value and a target quantity corresponding to the target value; the target quantity is the number of data read / write units that store the target value in multiple data read / write units corresponding to the first data; the target value is each value in the first data.

[0192] Based on the statistical data corresponding to the first data, determine the number of data read / write units that store the data corresponding to the pre-filtering conditions.

[0193] Calculate the ratio of the number of data read / write units storing the data corresponding to the pre-filtering condition to the number of data read / write units storing the first data, and determine the ratio as the sparsity probability corresponding to the pre-filtering condition.

[0194] According to one or more embodiments of this application, [Example 5] provides a method for determining pre-filtering conditions, wherein obtaining the statistical data corresponding to the first data includes:

[0195] Generate statistical data corresponding to the database data, and store the statistical data corresponding to the database data within a preset time period into a cache;

[0196] Obtain the statistical data corresponding to the first data from the statistical data within the preset time period stored in the cache.

[0197] According to one or more embodiments of this application, [Example Six] provides a method for determining pre-filtering conditions, wherein the statistical data corresponding to the generated database data includes:

[0198] When data is written to the database, statistical data corresponding to the database data written to the database is generated;

[0199] Alternatively, after the data is written to the database, the statistical data generation task is placed in a queue, and the statistical data generation task in the queue is executed at a preset time to reread the database data and generate the statistical data corresponding to the database data.

[0200] According to one or more embodiments of this application, [Example 7] provides a method for determining pre-filtering conditions, wherein the data structure of the statistical data is a histogram data structure.

[0201] According to one or more embodiments of this application, [Example Eight] provides a pre-filtering condition determination apparatus, the apparatus comprising:

[0202] The retrieval unit is used to retrieve multiple query conditions provided by the user.

[0203] The first determining unit is configured to determine multiple query condition groups based on the multiple query conditions; each query condition group includes pre-filtered conditions and non-pre-filtered conditions, and the execution order of the query condition groups is to execute the pre-filtered conditions and then execute the non-pre-filtered conditions;

[0204] The second determining unit is used to determine the total amount of data read required to execute the pre-filtering conditions and the non-pre-filtering conditions in the query condition group;

[0205] The third determining unit is used to determine the pre-filtering conditions in the query condition group where the total data read volume meets the preset conditions as the target pre-filtering conditions.

[0206] According to one or more embodiments of this application, [Example Nine] provides a pre-filtering condition determination device, wherein the second determination unit includes:

[0207] The first acquisition subunit is used to acquire the amount of first data to be read for executing the pre-filtering conditions; the first data is stored in multiple data read / write units; the first data includes data corresponding to the pre-filtering conditions;

[0208] The second acquisition subunit is used to acquire the sparse probability corresponding to the pre-filtering condition; the sparse probability is used to represent the proportion of the data read / write unit storing the data corresponding to the pre-filtering condition among the multiple data read / write units storing the first data.

[0209] The third acquisition subunit is used to acquire the original data required to execute the non-pre-filtering conditions;

[0210] The fourth acquisition subunit is used to acquire the amount of second data to be read for executing the non-pre-filtering condition based on the sparse probability corresponding to the pre-filtering condition and the original data to be read for executing the non-pre-filtering condition.

[0211] The first determining subunit is used to determine the sum of the data volume of the first data and the data volume of the second data as the total data reading volume required to execute the pre-filtering conditions and non-pre-filtering conditions in the query condition group.

[0212] According to one or more embodiments of this application, [Example 10] provides a pre-filtering condition determination device, wherein the fourth acquisition subunit includes:

[0213] The fifth acquisition subunit is used to acquire the amount of original data to be read for executing the non-pre-filtering conditions;

[0214] The first calculation subunit is used to calculate the product of the sparse probability corresponding to the pre-filtering condition and the original data volume, and use the product as the data volume of the second data to be read to execute the non-pre-filtering condition.

[0215] According to one or more embodiments of this application, [Example 11] provides a pre-filtering condition determination device, wherein the second acquisition subunit includes:

[0216] The sixth acquisition subunit is used to acquire statistical data corresponding to the first data; the statistical data includes a target value and a target quantity corresponding to the target value; the target quantity is the number of data read / write units that store the target value in the plurality of data read / write units corresponding to the first data; the target value is each value in the first data;

[0217] The second determining subunit is used to determine the number of data read / write units storing the data corresponding to the pre-filtering conditions based on the statistical data corresponding to the first data.

[0218] The second calculation subunit is used to calculate the ratio of the number of data read / write units storing the data corresponding to the pre-filtering condition to the number of data read / write units storing the first data, and to determine the ratio as the sparsity probability corresponding to the pre-filtering condition.

[0219] According to one or more embodiments of this application, [Example Twelve] provides a pre-filtering condition determination device, wherein the sixth acquisition subunit includes:

[0220] A generation subunit is used to generate statistical data corresponding to database data and store the statistical data corresponding to the database data within a preset time period into a cache.

[0221] The seventh acquisition subunit is used to acquire the statistical data corresponding to the first data from the statistical data stored in the cache within the preset time period.

[0222] According to one or more embodiments of this application, [Example Thirteen] provides a pre-filtering condition determination device, wherein the generation subunit is specifically used for:

[0223] When data is written to the database, statistical data corresponding to the database data written to the database is generated;

[0224] Alternatively, after the data is written to the database, the statistical data generation task is placed in a queue, and the statistical data generation task in the queue is executed at a preset time to reread the database data and generate the statistical data corresponding to the database data.

[0225] According to one or more embodiments of this application, [Example Fourteen] provides a pre-filtering condition determination device, wherein the data structure of the statistical data is a histogram data structure.

[0226] According to one or more embodiments of this application, [Example Fifteen] provides an electronic device, including:

[0227] One or more processors;

[0228] Storage device, on which one or more programs are stored,

[0229] When the one or more programs are executed by the one or more processors, the one or more processors implement the pre-filtering condition determination method as described above.

[0230] According to one or more embodiments of this application, [Example Sixteen] provides a computer-readable medium, characterized in that it stores a computer program thereon, wherein the program, when executed by a processor, implements the pre-filtering condition determination method as described above.

[0231] It should be noted that the various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the systems or apparatus disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple, and relevant parts can be referred to the method section.

[0232] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.

[0233] It should also be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0234] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented directly by hardware, a software module executed by a processor, or a combination of both. The software module can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.

[0235] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for determining pre-filtering conditions, characterized in that, The method includes: Retrieve multiple query conditions provided by the user; Multiple query condition groups are determined based on the multiple query conditions; each query condition group includes pre-filtered conditions and non-pre-filtered conditions, and the execution order of the query condition groups is to execute the pre-filtered conditions and then execute the non-pre-filtered conditions. Obtain the amount of first data required to execute the pre-filtering conditions; the first data is stored in multiple data read / write units; the first data includes data corresponding to the pre-filtering conditions; Obtain the sparse probability corresponding to the pre-filtering condition; the sparse probability is used to represent the proportion of the data read / write unit storing the data corresponding to the pre-filtering condition among the multiple data read / write units storing the first data; Obtain the original data required to execute the non-pre-filtered conditions; Based on the sparse probability corresponding to the pre-filtering condition and the original data required to execute the non-pre-filtering condition, obtain the amount of second data required to execute the non-pre-filtering condition; The sum of the data volume of the first data and the data volume of the second data is determined as the total data reading volume required to execute the pre-filtered and non-pre-filtered conditions in the query condition group; The pre-filtering conditions in the query condition group where the total data read volume meets the preset conditions are determined as the target pre-filtering conditions.

2. The method according to claim 1, characterized in that, The step of obtaining the amount of second data required to execute the non-pre-filtering condition based on the sparse probability corresponding to the pre-filtering condition and the original data required to execute the non-pre-filtering condition includes: Obtain the amount of original data required to execute the non-pre-filtered conditions; Calculate the product of the sparse probability corresponding to the pre-filtering condition and the original data volume, and use the product as the amount of second data to be read to execute the non-pre-filtering condition.

3. The method according to claim 1, characterized in that, The step of obtaining the sparse probability corresponding to the pre-filtering condition includes: Obtain statistical data corresponding to the first data; the statistical data includes a target value and a target quantity corresponding to the target value; the target quantity is the number of data read / write units that store the target value in multiple data read / write units corresponding to the first data; the target value is each value in the first data. Based on the statistical data corresponding to the first data, determine the number of data read / write units that store the data corresponding to the pre-filtering conditions. Calculate the ratio of the number of data read / write units storing the data corresponding to the pre-filtering condition to the number of data read / write units storing the first data, and determine the ratio as the sparsity probability corresponding to the pre-filtering condition.

4. The method according to claim 3, characterized in that, The step of obtaining the statistical data corresponding to the first data includes: Generate statistical data corresponding to the database data, and store the statistical data corresponding to the database data within a preset time period into a cache; Obtain the statistical data corresponding to the first data from the statistical data within the preset time period stored in the cache.

5. The method according to claim 4, characterized in that, The statistical data corresponding to the generated database data includes: When data is written to the database, statistical data corresponding to the database data written to the database is generated; Alternatively, after the data is written to the database, the statistical data generation task is placed in a queue, and the statistical data generation task in the queue is executed at a preset time to reread the database data and generate the statistical data corresponding to the database data.

6. The method according to claim 4, characterized in that, The statistical data is structured as a histogram.

7. A pre-filtration condition determination device, characterized in that, The device includes: The retrieval unit is used to retrieve multiple query conditions provided by the user. The first determining unit is configured to determine multiple query condition groups based on the multiple query conditions; each query condition group includes pre-filtered conditions and non-pre-filtered conditions, and the execution order of the query condition groups is to execute the pre-filtered conditions and then execute the non-pre-filtered conditions; The second determining unit is configured to: obtain the amount of first data to be read for executing the pre-filtering condition; the first data is stored in multiple data read / write units; the first data includes data corresponding to the pre-filtering condition; obtain the sparsity probability corresponding to the pre-filtering condition; the sparsity probability is used to represent the proportion of the data read / write unit storing the data corresponding to the pre-filtering condition in the multiple data read / write units storing the first data; obtain the original data to be read for executing the non-pre-filtering condition; based on the sparsity probability corresponding to the pre-filtering condition and the original data to be read for executing the non-pre-filtering condition, obtain the amount of second data to be read for executing the non-pre-filtering condition; and determine the sum of the amount of first data and the amount of second data as the total amount of data to be read for executing the pre-filtering condition and the non-pre-filtering condition in the query condition group. The third determining unit is used to determine the pre-filtering conditions in the query condition group where the total data read volume meets the preset conditions as the target pre-filtering conditions.

8. An electronic device, characterized in that, include: One or more processors; Storage device, on which one or more programs are stored, When the one or more programs are executed by the one or more processors, the one or more processors implement the pre-filtering condition determination method as described in any one of claims 1-6.

9. A computer-readable storage medium, characterized in that, It stores a computer program, wherein the computer program, when executed by a processor, implements the pre-filtering condition determination method as described in any one of claims 1-6.