Data table query method and device, storage medium and electronic equipment
By splitting the data table into multiple sub-tables and adjusting the query request order, the problem of low security in data table queries was solved, protection against frequency statistics attacks was achieved, and the security of data queries was improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2023-06-30
- Publication Date
- 2026-06-16
Smart Images

Figure CN116842039B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computers, and more specifically, to a method and apparatus for querying a data table, a storage medium, and an electronic device. Background Technology
[0002] To enhance data security, data in data tables is often encrypted, for example, using deterministic encryption algorithms.
[0003] In related technologies, it is often necessary to jointly query data from multiple data tables. For example, a hash-join can be used to jointly query data from multiple data tables. However, this method may lead to the leakage of the frequency of data occurrences in the data tables. If the attributes of the data in the data tables are known, frequency statistics attacks can be used to obtain the frequency of encrypted data occurrences without decryption, thereby inferring the key used for encryption and leading to the leakage of plaintext data. This results in low security for data table queries.
[0004] There is currently no effective solution to the problem of low security in the above-mentioned data table queries. Summary of the Invention
[0005] This application provides a method and apparatus for querying data tables, a storage medium, and an electronic device to at least solve the technical problem of low security in data table queries.
[0006] According to one aspect of the embodiments of this application, a method for querying a data table is provided, including:
[0007] A data table query request is obtained, wherein the data table query request is used to request a query in a second data table on the server for the attribute value of a second attribute that matches the attribute value of a first attribute, wherein the first attribute is an attribute included in the first data table on the server; the data table query request is converted into a first set of data sub-table query requests, wherein the first set of data sub-table query requests is used to request a query in a second set of data sub-tables for the attribute value of the second attribute that matches the attribute value of the first attribute in the first set of data sub-tables, wherein the first set of data sub-tables is a data sub-table obtained by splitting the first data table according to the attribute value of the first attribute, and the second set of data sub-tables is a data sub-table obtained by splitting the second data table according to the attribute value of the second attribute; a second set of data sub-table query requests is generated, wherein the second set of data sub-table query requests is different from the first set of data sub-table query requests; the first set of data sub-table query requests and the second set of data sub-table query requests are sent to the server, and the first set of query results corresponding to the first set of data sub-table query requests and the second set of query results corresponding to the second set of data sub-table query requests sent by the server are obtained.
[0008] According to another aspect of the embodiments of this application, a data table query apparatus is also provided, comprising: an acquisition unit, configured to acquire a data table query request, wherein the data table query request is configured to request querying in a second data table on a server an attribute value of a second attribute that matches an attribute value of a first attribute, the first attribute being an attribute included in a first data table on the server; and a conversion unit, configured to convert the data table query request into a first set of data sub-table query requests, wherein the first set of data sub-table query requests is configured to request querying in a second set of data sub-tables an attribute value of the second attribute that matches an attribute value of the first attribute in the first set of data sub-tables. The first set of data sub-tables is a data sub-table obtained by splitting the first data table according to the attribute value of the first attribute, and the second set of data sub-tables is a data sub-table obtained by splitting the second data table according to the attribute value of the second attribute; the generation unit is used to generate a second set of data sub-table query request, wherein the second set of data sub-table query request is different from the first set of data sub-table query request; the sending unit is used to send the first set of data sub-table query request and the second set of data sub-table query request to the server, and to obtain the first set of query results corresponding to the first set of data sub-table query request and the second set of query results corresponding to the second set of data sub-table query request sent by the server.
[0009] Optionally, the generation unit includes: a first generation module, configured to generate a second set of data sub-table query request, wherein the second set of data sub-table query request is configured to request querying in the second set of data sub-tables an attribute value of the second attribute that matches the attribute value of the third attribute in the third set of data sub-tables, wherein the third set of data sub-tables is a data sub-table obtained by splitting the third data table on the server according to the attribute value of the third attribute; or a second generation module, configured to generate a second set of data sub-table query request, wherein the second set of data sub-table query request is configured to request querying in the fourth set of data sub-tables an attribute value of the fourth attribute that matches the attribute value of the first attribute in the first set of data sub-tables, wherein the fourth set of data sub-tables is a data sub-table obtained by splitting the fourth data table on the server according to the attribute value of the fourth attribute; or a third generation module, configured to generate a second set of data sub-table query request, wherein the second set of data sub-table query request is configured to request querying in the fourth set of data sub-tables an attribute value of the fourth attribute that matches the attribute value of the third attribute in the third set of data sub-tables.
[0010] Optionally, the apparatus further includes: a first search unit, configured to search on the client for the identifier of a data table that satisfies a first matching condition with the first data table, to obtain the identifier of the third data table, and to obtain on the client the identifier of the third group of data sub-tables that have a mapping relationship with the third data table, wherein the first matching condition includes a third value range that is at least partially the same as the first value range, the first value range being the value range of the attribute value of the first attribute in the first data table, and the third value range being the value range of the attribute value of the third attribute in the third data table; or a second search unit, configured to search on the client for the identifier of a data sub-table that satisfies a second matching condition with a data sub-table in the first group of data sub-tables, to obtain the identifier of the third group of data sub-tables, wherein the second matching condition includes a third value sub-range that is at least partially the same as the first value sub-range, the first value sub-range being the value range of the attribute value of the first attribute in the data sub-table in the first group of data sub-tables, and the third value sub-range being the value range of the attribute value of the third attribute in the data sub-table in the third group of data sub-tables.
[0011] Optionally, the first matching condition further includes: the first number and the third number are different, wherein the first number is the number of times the first value appears in the first attribute of the first data table, and the third number is the number of times the first value appears in the third attribute of the third data table, and the first value is a value in the third value range that is the same as the first value range; or, the first ratio and the third ratio are different, wherein the first ratio is the ratio obtained by dividing the first number by the first total number, the first total number is the sum of the number of times each value in the first value range appears in the first attribute of the first data table, and the third ratio is the ratio obtained by dividing the third number by the third total number, and the third total number is the sum of the number of times each value in the third value range appears in the third attribute of the third data table; or the second matching condition further includes: the first sub-number and the third sub-number. The three sub-quantities are different, wherein the first sub-quantity is the number of times the third value appears in the first attribute of one data sub-table in the first group of data sub-tables, and the third sub-quantity is the number of times the third value appears in the third attribute of another data sub-table in the third group of data sub-tables, and the third value is the same value in the third value sub-range as the first value sub-range; or, the first sub-ratio and the third sub-ratio are different, wherein the first sub-ratio is the ratio obtained by dividing the first sub-quantity by the total quantity of the first sub-table, and the total quantity of the first sub-table is the sum of the number of times each value in the first value sub-range appears in the first attribute of one data sub-table, and the third sub-ratio is the ratio obtained by dividing the third sub-quantity by the total quantity of the third sub-table, and the total quantity of the third sub-table is the sum of the number of times each value in the third value range appears in the third attribute of another data sub-table.
[0012] Optionally, the apparatus further includes: a third search unit, configured to search on the client for the identifier of a data table that satisfies a third matching condition with the second data table, obtain the identifier of the fourth data table, and obtain on the client the identifier of the fourth group of data sub-tables that has a mapping relationship with the fourth data table, wherein the third matching condition includes a fourth value range that is at least partially the same as a second value range, the second value range being the value range of the attribute value of the second attribute in the second data table, and the fourth value range being the value range of the attribute value of the fourth attribute in the fourth data table; or a fourth search unit, configured to search on the client for the identifier of a data sub-table that satisfies a fourth matching condition with a data sub-table in the second group of data sub-tables, obtain the identifier of the fourth group of data sub-tables, wherein the fourth matching condition includes a fourth value sub-range that is at least partially the same as a second value sub-range, the second value sub-range including the value range of the attribute value of the second attribute in the data sub-table in the second group of data sub-tables, and the fourth value sub-range including the value range of the attribute value of the fourth attribute in the data sub-table in the fourth group of data sub-tables.
[0013] Optionally, the third matching condition further includes: the second number and the fourth number are different, wherein the second number is the number of times the second value appears in the second attribute of the second data table, the fourth number is the number of times the second value appears in the fourth attribute of the fourth data table, and the second value is a value that is the same as the value in the fourth value range and the second value range; or, the second ratio and the fourth ratio are different, wherein the second ratio is the ratio obtained by dividing the second number by the second total number, the second total number is the sum of the number of times each value in the second value range appears in the second attribute of the second data table, and the fourth ratio is the ratio obtained by dividing the fourth number by the fourth total number, and the fourth total number is the sum of the number of times each value in the fourth value range appears in the fourth attribute of the fourth data table; or the fourth matching condition further includes: the second sub-number and the fourth number. The four sub-quantities are different, wherein the second sub-quantity is the number of times the fourth value appears in the second attribute of one data sub-table in the second group of data sub-tables, and the fourth sub-quantity is the number of times the second value appears in the fourth attribute of another data sub-table in the fourth group of data sub-tables, and the fourth value is the same value in the fourth value sub-range as the second value sub-range; or, the second sub-proportion is different from the fourth sub-proportion, wherein the second sub-proportion is the proportion obtained by dividing the second sub-quantity by the total quantity of the second sub-table, and the total quantity of the second sub-table is the sum of the number of times each value in the second value sub-range appears in the second attribute of one data sub-table, and the fourth sub-proportion is the proportion obtained by dividing the fourth sub-quantity by the total quantity of the fourth sub-table, and the total quantity of the fourth sub-table is the sum of the number of times each value in the fourth value range appears in the fourth attribute of another data sub-table.
[0014] Optionally, when the second group of data sub-table query request is used to request querying the attribute value of the second attribute in the second group of data sub-table that matches the attribute value of the third attribute in the third group of data sub-table, the first group query result includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the second attribute found in the second group of data sub-table that matches the attribute value of the first attribute in the first group of data sub-table; the second group query result includes encrypted data of the attribute value of the third attribute in the third group of data sub-table, and encrypted data of the attribute value of the second attribute found in the second group of data sub-table that matches the attribute value of the third attribute in the third group of data sub-table; or when the second group of data sub-table query request is used to request querying the attribute value of the fourth attribute in the fourth group of data sub-table that matches the attribute value of the first attribute in the first group of data sub-table, the first group query result includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the fourth attribute found in the second group of data sub-table that matches the attribute value of the third attribute in the third group of data sub-table. The second set of query results includes encrypted data of the attribute value of the second attribute that matches the attribute value of the first attribute in the first set of data sub-tables, and encrypted data of the attribute value of the fourth attribute that matches the attribute value of the first attribute in the first set of data sub-tables, as found in the fourth set of data sub-tables; or, when the second set of data sub-table query request is used to request the query of the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third set of data sub-tables in the fourth set of data sub-tables, the first set of query results includes encrypted data of the attribute value of the first attribute in the first set of data sub-tables, and encrypted data of the attribute value of the second attribute that matches the attribute value of the first attribute in the first set of data sub-tables, as found in the second set of data sub-tables, and the second set of query results includes encrypted data of the attribute value of the third attribute in the third set of data sub-tables, and encrypted data of the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third set of data sub-tables, as found in the fourth set of data sub-tables.
[0015] Optionally, the apparatus further includes: an updating unit, configured to update the first data table to a fifth data table based on a first difference between the first data table and the second data table when the first data table is a data table obtained by performing a first smoothing process on the attribute values of the first attribute in the first real data table, and the first real data table is updated to a second real data table; wherein the first smoothing process is used to ensure that each attribute value of the first attribute in the first real data table appears the same number of times in the first data table, and the second difference between the first data table and the fifth data table is the same as the first difference; and a determining unit, configured to determine based on the second real data table. The system includes a fifth data table and a system for determining whether to perform a second smoothing process on the second real data table. A smoothing processing unit is configured to perform the second smoothing process on the second real data table if it is determined that the second smoothing process should be performed, to obtain a sixth data table, and to replace the first data table on the server with the sixth data table. The second smoothing process is configured to ensure that the number of occurrences of each attribute value of the first attribute in the second real data table is the same in the sixth data table. A replacement unit is configured to replace the first data table on the server with the fifth data table if it is determined that the second smoothing process should not be performed on the second real data table.
[0016] Optionally, the determining unit includes: a first determining module, configured to determine the distribution difference between each attribute value of the first attribute in the second real data table and each attribute value of the first attribute in the fifth data table; a second determining module, configured to determine to perform the second smoothing process on the second real data table if the distribution difference is less than or equal to a preset threshold; and a third determining module, configured to determine not to perform the second smoothing process on the second real data table if the distribution difference is greater than the preset threshold.
[0017] Optionally, the sending unit repeats the following steps until both the first group of data sub-table query requests and the second group of data sub-table query requests have been sent to the server: randomly selecting one or more data sub-table query requests from the data sub-table query requests in the first group of data sub-table query requests and the second group of data sub-table query requests that have not yet been sent to the server, and sending the randomly selected one or more data sub-table query requests to the server.
[0018] Optionally, the first set of data sub-tables is a data sub-table obtained by splitting the first data table according to the value range of the attribute of the first attribute, wherein the value range of the attribute of the first attribute is different in each data sub-table of the first set of data sub-tables; the second set of data sub-tables is a data sub-table obtained by splitting the second data table according to the value range of the attribute of the second attribute, wherein the value range of the attribute of the second attribute is different in each data sub-table of the second set of data sub-tables.
[0019] Optionally, the data table query request is used to request a query in the second data table for the attribute value of a second attribute that has a first matching relationship with the attribute value of the first attribute. The first matching relationship means that the hash value of the attribute value of the first attribute is the same as the hash value of the attribute value of the second attribute, or that the hash value of the attribute value of the first attribute is the same as the hash value of a value within a first target value range. The first target value range is the value range represented by an attribute value of an attribute corresponding to the second attribute in the second data table. The first group of data sub-table query request is used to request a query in the second group of data sub-tables for the attribute value of the second attribute that has a second matching relationship with the attribute value of the first attribute in the first group of data sub-tables. The second matching relationship means that the hash value of the attribute value of the first attribute is the same as the hash value of the second attribute, or that the hash value of the attribute value of the first attribute is the same as the hash value of a value within a second target value range. The second target value range is the value range represented by an attribute value of an attribute corresponding to the second attribute in the second group of data sub-tables.
[0020] According to another aspect of the embodiments of this application, a computer-readable storage medium is also provided, wherein a computer program is stored in the computer-readable storage medium, and the computer program is configured to execute the query method of the above-mentioned data table at runtime.
[0021] According to another aspect of the embodiments of this application, a computer program product or computer program is provided, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the query method for the data table described above.
[0022] According to another aspect of the embodiments of this application, an electronic device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the above-described data table query method through the computer program.
[0023] In this embodiment, the data table query request obtained from the client is converted into a first set of data sub-table query requests, and a second set of data sub-table query requests is generated. On the one hand, randomly sending the first and second sets of data sub-table query requests to the server can interfere with the sending order of the query requests in the first set of data sub-table query requests. On the other hand, the query results of the second set of data sub-table query requests can interfere with the frequency statistics performed on the query results of the first set of data sub-table query requests. The returned second set of query results is unrelated to the first set of query results. In this way, the statistical analysis based on the frequency of the occurrence of encrypted data in the query results of the first set of data sub-table query requests is confused, thereby achieving the technical effect of improving the security of data table queries and solving the technical problem of low security of data table queries. Attached Figure Description
[0024] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:
[0025] Figure 1 This is a schematic diagram of an optional first data table according to an embodiment of this application;
[0026] Figure 2 This is a schematic diagram of an optional second data table according to an embodiment of this application;
[0027] Figure 3 This is a schematic diagram of an optional third data table according to an embodiment of this application;
[0028] Figure 4 This is a schematic diagram of an optional fourth data table according to an embodiment of this application;
[0029] Figure 5 This is a schematic diagram of an optional hash ciphertext table according to an embodiment of this application;
[0030] Figure 6 This is a schematic diagram of an optional sub-table mapping table according to an embodiment of this application;
[0031] Figure 7 This is a schematic diagram of an optional distribution information according to an embodiment of this application;
[0032] Figure 8 This is a schematic diagram illustrating an application scenario of an optional data table query method according to an embodiment of this application;
[0033] Figure 9 This is a flowchart illustrating an optional data table query method according to an embodiment of this application;
[0034] Figure 10 This is an illustration of an optional second set of data sub-table query request according to an embodiment of this application. Figure 1 ;
[0035] Figure 11 This is an illustration of an optional second set of data sub-table query request according to an embodiment of this application. Figure 2 ;
[0036] Figure 12 This is a schematic diagram of an optional first matching condition according to an embodiment of this application;
[0037] Figure 13 This is a schematic diagram of an optional second matching condition according to an embodiment of this application;
[0038] Figure 14 This is a schematic diagram of an optional third matching condition according to an embodiment of this application;
[0039] Figure 15 This is a schematic diagram of an optional fourth matching condition according to an embodiment of this application;
[0040] Figure 16 This is a schematic diagram of an optional first set of query results and a second set of query results according to an embodiment of this application;
[0041] Figure 17 This is a schematic diagram of an optional second smoothing process according to an embodiment of this application. Figure 1 ;
[0042] Figure 18 This is a schematic diagram of an optional second smoothing process according to an embodiment of this application. Figure 2 ;
[0043] Figure 19 This is a schematic diagram illustrating an optional method for determining the chi-square value according to an embodiment of this application;
[0044] Figure 20 This is a schematic diagram of an optional random sending of a data sub-table query request according to an embodiment of this application;
[0045] Figure 21 This is a schematic diagram illustrating an optional smoothing process for split data sub-tables according to an embodiment of this application;
[0046] Figure 22 This is a schematic diagram illustrating an optional hash value according to an embodiment of this application;
[0047] Figure 23 This is a schematic diagram of an optional data table query method according to an embodiment of this application;
[0048] Figure 24 This is a schematic diagram illustrating the query latency of an optional data table according to an embodiment of this application;
[0049] Figure 25 This is a schematic diagram of the structure of an optional data table query device according to an embodiment of this application;
[0050] Figure 26 This is a schematic diagram of the structure of an optional electronic device according to an embodiment of this application;
[0051] Figure 27 This is a computer system architecture block diagram of an optional electronic device according to an embodiment of this application. Detailed Implementation
[0052] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.
[0053] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0054] First, some nouns or terms that appear in the description of the embodiments of this application shall be interpreted as follows:
[0055] Hash-join: Hash-join is a relational database query optimization technique that uses hash tables to accelerate table join operations in a database, typically faster than traditional nested loop joins and sort-merge joins. Specifically, hash-join is used to perform join operations on two or more tables. When performing a join operation, it's usually necessary to compare one or more columns from the two tables to find matching rows. Hash-join uses a hash function to distribute the columns of the tables into different hash buckets, and during the join operation, it only needs to compare the data in the hash buckets, thus improving query efficiency. Hash-join is generally suitable for join operations on large tables, and performance can be further improved through parallel processing.
[0056] Deterministic encryption: Deterministic encryption is a type of encryption that can convert plaintext data into ciphertext data while maintaining consistency between the plaintext and ciphertext data. This means that for a given set of plaintext data, a deterministic encryption algorithm will always generate the same ciphertext data.
[0057] Statistical analysis attacks: These are attack methods targeting encryption systems. Attackers deduce encryption keys or plaintext data by performing statistical analysis on encrypted data. Statistical analysis attacks can exploit features such as patterns, repetitions, and correlations in encrypted data to break encryption systems.
[0058] Security Model: The security model employs a passive, persistent adversary approach for encrypted databases. The adversary can observe all encrypted accesses but will not actively execute its own accesses. This model better reflects the potential for insider data breaches in real-world industrial environments. Furthermore, it presupposes that it is aware of which attribute columns in the data tables will be involved in table join operations.
[0059] Attack Model: Frequency Analysis Attack is a cryptographic attack method used to crack encrypted text. This attack method is based on the assumption that certain letters or letter combinations appear more frequently than others in the encrypted text. By performing frequency analysis on the encrypted text, the attacker can deduce the most frequent letters or letter combinations, thereby guessing the key used in the encryption algorithm.
[0060] CPA (Chosen-Plaintext Attack) security: CPA security is a security property of encryption algorithms, meaning that when the encryption algorithm selects plaintext data, an attacker cannot distinguish between the encrypted ciphertext data and randomly generated ciphertext data. This means that an attacker cannot deduce relevant information about the plaintext data from the ciphertext data, thus ensuring the security of the encrypted data. CPA security is one of the fundamental security properties of modern encryption algorithms and is commonly used to evaluate the security and strength of encryption algorithms.
[0061] AES (Advanced Encryption Standard): The AES encryption algorithm uses a symmetric key to encrypt and decrypt data. It employs a block cipher method to encrypt data in blocks and is widely considered a secure and reliable encryption technology.
[0062] According to one aspect of the embodiments of this application, a method for querying a data table is provided. In order to better understand the application scenarios of the data table querying method in the embodiments of this application, the application scenarios of the data table querying method in the embodiments of this application will be explained and described below in conjunction with optional embodiments.
[0063] like Figure 1 As shown, the data table stu may, but is not limited to, record the scores of the serial numbers 1 to 6 and the names name_1 to name_6 (which may, but are not limited to, 80, 90, 80, 70, 60, and 65 respectively). The serial numbers, names, and scores in the data table stu may, but are not limited to, be encrypted to obtain E(stu), which is equivalent to the first data table.
[0064] Optionally, in this embodiment, data security can be improved by using, but is not limited to, symmetric encryption algorithms, such as AES encryption algorithm or DES (Data Encryption Standard) encryption algorithm, or asymmetric encryption algorithms (such as RSA encryption algorithm (Rivest-Shamir-Adleman encryption algorithm) or ECC encryption algorithm (Elliptic Curve Cryptography) algorithm, etc.
[0065] Data table E(stu) may, but is not limited to, record the encrypted serial numbers E(1) to E(6), and the encrypted scores of the encrypted names E(name_1) to E(name_6) (which may, but are not limited to, E(80), E(90), E(80), E(70), E(60), and E(65) respectively). Data table E(stu) may, but is not limited to, be split into data sub-tables E(stu_1) and E(stu_2) according to the range of scores recorded in data table E(stu), where the score range in data sub-table E(stu_1) is 75 to 100, and the score range in data sub-table E(stu_2) is 60 to 75. Data sub-table E(stu_1) records the encrypted serial numbers E(1) to E(3), and the encryption scores corresponding to the encrypted names E(name_1) to E(name_3) can be, but are not limited to, E(80), E(90), and E(80) respectively. Data sub-table E(stu_2) records the encrypted serial numbers E(4) to E(6), and the encryption scores corresponding to the encrypted names E(name_4) to E(name_6) can be, but are not limited to, E(70), E(60), and E(65) respectively.
[0066] The scores of each value in the data sub-table E(stu_1) have different frequencies. In order to obfuscate the frequency of the scores of each value in the data sub-table E(stu_1), the data sub-table E(stu_1) can be smoothed (equivalent to the first smoothing process) to obtain the data sub-table E(stu_1'). E(stu_1') can record, but is not limited to, the encrypted serial numbers E(1) to E(4), the encrypted scores of the encrypted names E(name_1) to E(name_3) and E(name_2') (which can be, but is not limited to, E(80), E(90), E(80), E(90) respectively), and the identifiers of the encrypted scores of the encrypted names E(name_1) to E(name_3) and E(name_2') which can be, but is not limited to, E(1), E(1), E(1) and E(0) respectively. Optionally, in this embodiment, E(1) represents the actual score recorded in the data sub-table E(stu_1), and E(0) represents the virtual score introduced to smooth the frequency of data occurrence in E(stu_1).
[0067] Since the scores of each value in the data sub-table E(stu_2) appear with the same frequency, there is no need to smooth the data sub-table E(stu_2) (equivalent to the first smoothing process). It is possible, but not limited to, to identify each data record E(1) in the data sub-table E(stu_2) to indicate that each data in the data sub-table E(stu_2) is real data from the data table E(stu).
[0068] like Figure 2 As shown, the data table `child` may, but is not limited to, record the following: Serial number 1 corresponds to a score range of 90 to 100, and the grade corresponding to a score range of 90 to 100 is Excellent; Serial number 2 corresponds to a score range of 80 to 90, and the grade corresponding to a score range of 80 to 90 is Good; Serial number 3 corresponds to a score range of 60 to 80, and the grade corresponding to a score range of 60 to 80 is Pass; Serial number 4 corresponds to a score range of 0 to 60, and the grade corresponding to a score range of 0 to 60 is Fail.
[0069] The serial number, range, and level in the data table child can be encrypted using symmetric encryption algorithms or asymmetric encryption algorithms, etc., to obtain the data table E(child), which is equivalent to a second data table. The data table E(child) records the encrypted serial number E(1) with an encryption score range of E(90 to 100) and an encryption level of E(Excellent) corresponding to the encryption score range E(90 to 100); the encrypted serial number E(2) with an encryption score range of E(80 to 90) and an encryption level of E(Good) corresponding to the encryption score range E(80 to 90); the encrypted serial number E(3) with an encryption score range of E(60 to 80) and an encryption level of E(60 to 80) corresponding to the encryption level of E(Pass); and the encrypted serial number E(4) with a score range of E(0 to 60) and an encryption level of E(Fail).
[0070] Data table E(child) can be split into sub-tables E(child_1) and E(child_2) according to the range of grades recorded in the data table E(child). The range of grades in sub-table E(child_1) is excellent and good, and the range of grades in sub-table E(child_2) is qualified and unqualified. Sub-table E(child_1) records that the encrypted serial number E(1) corresponds to an encryption score range of E(90 to 100), and the encryption level corresponding to the encryption score range E(90 to 100) is E(excellent); the encrypted serial number E(2) corresponds to an encryption score range of E(80 to 90), and the encryption level corresponding to the encryption score range E(80 to 90) is E(good). The data sub-table E(child_2) records the encryption score range of encryption sequence number E(3) as E(60 to 80), and the encryption level corresponding to the encryption score range of E(60 to 80) as E(qualified); the encryption score range of encryption sequence number E(4) as E(0 to 60), and the encryption level corresponding to the score range of E(0 to 60) as E(unqualified).
[0071] Since the frequencies of each level appear in data sub-tables E(child_1) and E(child_2) are the same, there is no need to perform smoothing processing (equivalent to the first smoothing processing) on data sub-tables E(child_1) and E(child_2). It is possible, but not limited to, to assign an identifier to each data record E(1) in data sub-tables E(child_1) and E(child_2). The identifier of E(1) indicates that each data in data sub-tables E(child_1) and E(child_2) is real data from data table E(child).
[0072] like Figure 3 As shown, the data table `weight` records the following: Serial number 1 corresponds to Id (Identity document) as Id_1, and Id_1 corresponds to a weight of 80; Serial number 2 corresponds to Id_2, and Id_2 corresponds to a weight of 80; Serial number 3 corresponds to Id_3, and Id_3 corresponds to a weight of 66; Serial number 4 corresponds to Id_4, and Id_4 corresponds to a weight of 80; Serial number 5 corresponds to Id_5, and Id_5 corresponds to a weight of 60; Serial number 6 corresponds to Id_6, and Id_6 corresponds to a weight of 65.
[0073] It is possible, but not limited to, to encrypt the sequence number, Id, and weight in the data table weight using symmetric encryption algorithms or asymmetric encryption algorithms, to obtain the data table E(weight), which is equivalent to a third data table. The data table E(weight) records the encryption ID corresponding to encryption number E(1) as E(Id_1), and the encryption weight corresponding to E(Id_1) as E(80); the encryption ID corresponding to encryption number E(2) as E(Id_2), and the encryption weight corresponding to E(Id_2) as E(80); the encryption ID corresponding to encryption number E(3) as E(Id_3), and the encryption weight corresponding to E(Id_3) as E(66); the encryption ID corresponding to encryption number E(4) as E(Id_4), and the encryption weight corresponding to E(Id_4) as 80; the encryption ID corresponding to encryption number E(5) as E(Id_5), and the encryption weight corresponding to E(Id_5) as E(60); the encryption ID corresponding to encryption number E(6) as E(Id_6), and the encryption weight corresponding to E(Id_6) as E(65).
[0074] You may, but are not limited to, split the data table E(weight) into data sub-tables E(weight_1) and E(weight_2) according to the weight value range. The weight value range in data sub-table E(weight_1) is 80 to 100, and the weight value range in data sub-table E(weight_2) is 60 to 80.
[0075] The data sub-table E(weight_1) records the encryption ID corresponding to the encryption sequence number E(1) as E(Id_1) and the encryption weight corresponding to E(Id_1) as E(80); the encryption ID corresponding to the encryption sequence number E(2) as E(Id_2) and the encryption weight corresponding to E(Id_2) as E(80); the encryption ID corresponding to the encryption sequence number E(4) as E(Id_4) and the encryption weight corresponding to E(Id_4) as 80. In the data sub-table E(weight_2), the encryption ID corresponding to the encryption sequence number E(3) is recorded as E(Id_3), and the encryption weight corresponding to E(Id_3) is E(66); the encryption ID corresponding to the encryption sequence number E(5) is recorded as E(Id_5), and the encryption weight corresponding to E(Id_5) is E(60); the encryption ID corresponding to the encryption sequence number E(6) is recorded as E(Id_6), and the encryption weight corresponding to E(Id_6) is E(65).
[0076] The weight data of each value in data sub-table E(weight_1) and data sub-table E(weight_2) appear with the same frequency. Therefore, there is no need to smooth data sub-table E(weight_1) and data sub-table E(weight_2) (equivalent to the first smoothing process). Instead, we assign an identifier to each data record E(1) in data sub-table E(weight_1) and data sub-table E(weight_2) to indicate that each data in data sub-table E(weight_1) and data sub-table E(weight_2) comes from the real data in data table E(weight).
[0077] like Figure 4 As shown, the data table 'level' may record, but is not limited to, the following weight ranges: 1 corresponds to a weight range of 80 to 90, which is classified as overweight; 2 corresponds to a weight range of 70 to 80, which is classified as normal; 3 corresponds to a weight range of 60 to 70, which is classified as good; 4 corresponds to a weight range of 50 to 60, which is classified as underweight; and 5 corresponds to a weight range of 40 to 50, which is classified as underweight.
[0078] It is possible, but not limited to, to encrypt the sequence number, range, and level in the data table level using symmetric encryption algorithms or asymmetric encryption algorithms, to obtain data table E(level), which is equivalent to the fourth data table. The data table E(level) records the encrypted weight range corresponding to the encrypted serial number E(1), which is E(80 to 90), and the encryption level corresponding to the encrypted weight range E(80 to 90) is E(overweight); the encrypted serial number E(2) corresponds to the encrypted weight range E(70 to 80), and the encryption level corresponding to the encrypted weight range E(70 to 80) is E(normal); the encrypted serial number E(3) corresponds to the encrypted weight range E(60 to 70), and the encryption level corresponding to the encrypted weight range E(60 to 70) is E(good); the encrypted serial number E(4) corresponds to the encrypted weight range E(50 to 60), and the encryption level corresponding to the encrypted weight range E(50 to 50) is E(very thin).
[0079] Data table E(level) can be split into sub-tables E(level_1) and E(level_2) according to the range of levels recorded in the data table E(level_1). The range of levels in sub-table E(level_1) is overweight, normal, and good, and the range of levels in sub-table E(level_2) is underweight and very underweight. Sub-table E(level_1) records the encrypted weight range corresponding to the encrypted serial number E(1), which is E(80 to 90), and the encryption level corresponding to the encrypted weight range E(80 to 90) is E(overweight); the encrypted weight range corresponding to the encrypted serial number E(2) is E(70 to 80), and the encryption level corresponding to the encrypted weight range E(70 to 80) is E(normal); the encrypted weight range corresponding to the encrypted serial number E(3) is E(60 to 70), and the encryption level corresponding to the encrypted weight range E(60 to 70) is E(good). The data sub-table E(level_2) records the encryption weight range of encryption sequence number E(4) as E(50 to 60), and the encryption level of encryption weight range E(50 to 60) as E(slim); the encryption weight range of encryption sequence number E(5) as E(40 to 50), and the encryption level of encryption weight range E(40 to 50) as E(very slim).
[0080] Since the frequencies of each level appear in data sub-tables E(level_1) and E(level_2) are the same, there is no need to perform smoothing processing on data sub-tables E(level_1) and E(level_2) (equivalent to the first smoothing processing). It is possible, but not limited to, to assign an identifier to each data record E(1) in data sub-tables E(level_1) and E(level_2). The identifier of E(1) indicates that each data in data sub-tables E(level_1) and E(level_2) is real data from data table E(level).
[0081] Server 104 may, but is not limited to, store data table E(stu), data sub-tables E(stu_1') and E(stu_2) corresponding to data table E(stu), and hash ciphertext tables corresponding to data sub-tables E(stu_1') and E(stu_2); data table E(child), data sub-tables E(child_1) and E(child_2) corresponding to data table E(child), and hash ciphertext tables corresponding to data sub-tables E(child_1) and E(child_2) respectively. ; data table E(weight), data sub-tables E(weight_1) and E(weight_2) corresponding to data table E(weight), and hash ciphertext tables corresponding to data sub-tables E(weight_1) and E(weight_2) respectively; data table E(level), data sub-tables E(level_1) and E(level_2) corresponding to data table E(level), and hash ciphertext tables corresponding to data sub-tables E(level_1) and E(level_2) respectively.
[0082] The hash ciphertext table stored in the server in this application embodiment can be explained and described using data sub-table E(stu_1') as an example, but is not limited to this. It should be noted that the hash ciphertext table corresponding to data sub-table E(stu_2), data sub-table E(child_1), and data sub-table E(child_2), etc., has the same form as data sub-table E(stu_1').
[0083] like Figure 5As shown, the hash ciphertext table corresponding to data sub-table E(stu_1') stores the hash data corresponding to the encryption score E(80) for encryption sequence number E(1), the hash data corresponding to the encryption score E(90) for encryption sequence number E(2), the hash data corresponding to the encryption score E(80) for encryption sequence number E(3), and the hash data corresponding to the encryption score E(90) for encryption sequence number E(4). Optionally, the hash data in the hash ciphertext table can be stored, but is not limited to, in the form of an inverted index table, in the form of quadruples.<value,table,id,f_id> Here, `value` is the encrypted hash value used for hash-join operations on multiple hash-ciphertext tables. `table` is the name of the encrypted data table corresponding to the data (e.g., data table E(stu_1')), and `id` is the position of the data in the encrypted data table (e.g., the row where the data is located, the index of the attribute column (e.g., the column recording scores), etc.). `id` and `table` are used to locate the encrypted data table and data row to which the hash-join result belongs. Specifically, `id` and `table` can be used, but are not limited to, to locate the position of data in the corresponding encrypted data table, thereby retrieving the corresponding encrypted data from the encrypted data table. Furthermore, using an inverted index table allows direct connection of attribute values between data tables through the hash values recorded in the inverted index table, improving the efficiency of joining data across multiple data tables. Moreover, by using the `id` and `table` recorded in the inverted index table, the position of the data to be modified or deleted in the encrypted data table can be accurately located, improving the efficiency of data modification.
[0084] In detail, the hash data corresponding to the encrypted score E(80) corresponding to the encrypted sequence number E(1) stores the hash value H(1) corresponding to the encrypted score E(80), the table to which the encrypted score E(80) corresponding to the encrypted sequence number E(1) belongs is E(stu_1'), the id of the encrypted score E(80) corresponding to the encrypted sequence number E(1) is 1, indicating that the encrypted score E(80) corresponding to the encrypted sequence number E(1) is the first data in the score column of the data sub-table E(stu_1'), and the f_id of the encrypted score E(80) corresponding to the encrypted sequence number E(1) is 3, which is used to indicate that the next data with a score of 80 in the data sub-table E(stu_1') is the third data in the column recording the score.
[0085] The hash data corresponding to the encrypted score E(90) of the encrypted sequence number E(2) stores the hash value H(2) corresponding to the encrypted score E(90). The table to which the encrypted score E(90) of the encrypted sequence number E(2) belongs is E(stu_1'). The id of the encrypted score E(90) of the encrypted sequence number E(2) is 2, which means that the encrypted score E(90) of the encrypted sequence number E(2) is the second data in the score column of the data sub-table E(stu_1'). The f_id of the encrypted score E(90) of the encrypted sequence number E(2) is 4, which means that the next data with a score of 90 in the data sub-table E(stu_1') is the fourth data in the column recording the score.
[0086] The hash data corresponding to the encrypted score E(80) corresponding to the encrypted serial number E(3) stores the hash value H(3) corresponding to the encrypted score E(80). The table to which the encrypted score E(80) corresponding to the encrypted serial number E(3) belongs is E(stu_1'). The id of the encrypted score E(80) corresponding to the encrypted serial number E(3) is 3, which means that the encrypted score E(80) corresponding to the encrypted serial number E(3) is the third data in the score column in the data sub-table E(stu_1'). The f_id of the encrypted score E(80) corresponding to the encrypted serial number E(3) is 1, which is used to indicate that the next data with a score of 80 in the data sub-table E(stu_1') is the first data in the column recording the score.
[0087] The hash data corresponding to the encrypted score E(90) corresponding to the encrypted serial number E(4) stores the hash value H(4) corresponding to the encrypted score E(90). The table to which the encrypted score E(90) corresponding to the encrypted serial number E(4) belongs is E(stu_1'). The id of the encrypted score E(90) corresponding to the encrypted serial number E(4) is 4, indicating that the encrypted score E(90) corresponding to the encrypted serial number E(4) is the fourth data in the score column of the data sub-table E(stu_1'). The f_id of the encrypted score E(90) corresponding to the encrypted serial number E(4) is 2, which is used to indicate that the next data with a score of 90 in the data sub-table E(stu_1') is the second data in the column recording the score.
[0088] It should be noted that if there is no next data with a score of 90 in the data sub-table E(stu_1'), the f_id of the encrypted score E(90) corresponding to the encrypted sequence number E(4) can be recorded as -1, but is not limited to.
[0089] Client 102 may, but is not limited to, storing sub-table mapping tables and distribution information, such as... Figure 6As shown, the sub-table mapping table in client 102 stores the unencrypted original table name, original attributes, encrypted sub-table name, and sub-table attributes. Specifically, the sub-table mapping table stores the table name "stu" for the data table "stu", the name of the "score" attribute in the data table "stu", the table name E(stu_1) and attribute E(stu_1) of the corresponding sub-table "stu", and the table name E(stu_2) and attribute E(stu_2) of the corresponding sub-table "stu".
[0090] The sub-table mapping table stores the table name "child" of the data table "child", the name and grade of the score attribute in the data table "child", and the table name E(child_1), attribute E(grade_1) of the data sub-table corresponding to the data table "child", as well as the table name E(child_2), attribute E(grade_2) of the data sub-table corresponding to the data table "child". The sub-table mapping table also stores the table name "weight" of the data table "weight", the name and weight of the score attribute in the data table "weight", and the table name E(weight_1), attribute E(weight_1) of the data sub-table corresponding to the data table "weight", as well as the table name E(weight_2), attribute E(weight_2) of the data sub-table corresponding to the data table "weight". The sub-table mapping table stores the table name level of the data table level, the name grade of the score attribute in the data table level, the table name E(level_1) of the data sub-table corresponding to the data table level, the attribute E(grade_1) of the data sub-table E(level_1), and the table name E(level_2) of the data sub-table corresponding to the data table level, and the attribute E(grade_2) of the data sub-table E(level_2).
[0091] Client 102 may also store distribution information, but is not limited to, such as Figure 7As shown, the distribution information may include, but is not limited to, the name of the data sub-table, the attributes of the data sub-table, the value range, and the number of times the values appear. Specifically, the distribution information may include, but is not limited to, the table name E(stu_1) of the data sub-table E(stu_1), the attribute E(score_1) of the data sub-table E(stu_1), the attribute value range of attribute E(score_1) being 75 to 100, and the number of times each score appears in the data sub-table E(stu_1). Specifically, 90 appears once, and 80 appears twice. Similarly, the distribution information may include, but is not limited to, the table name E(stu_2) of the data sub-table E(stu_2), the attribute E(score_2) of the data sub-table E(stu_2), the attribute value range of attribute E(score_2) being 60 to 75, and the number of times each score appears in the data sub-table E(stu_2). Specifically, 60 appears once, 65 appears once, and 70 appears once.
[0092] The distribution information may include, but is not limited to, the table name E(child_1) of the data sub-table E(child_1), the attribute E(grade_1) of the data sub-table E(child_1), the attribute value range of attribute E(grade_1) being excellent and good, and the number of times each score of the data sub-table E(child_1) appears. Specifically, excellent and good each appear once. Similarly, the distribution information may include, but is not limited to, the table name E(child_2) of the data sub-table E(child_2), the attribute E(grade_2) of the data sub-table E(child_2), the attribute value range of attribute E(grade_2) being pass and fail, and the number of times each score of the data sub-table E(child_2) appears. Specifically, pass and fail each appear once.
[0093] The distribution information may include, but is not limited to, the table name E(weight_1) of the data sub-table E(weight_1), the attribute E(weight_1) of the data sub-table E(weight_1), the attribute value of attribute E(weight_1) ranging from 80 to 100, and the number of times the weight of each value in the data sub-table E(weight_1) appears. Specifically, 80 appears 3 times. The distribution information may also include, but is not limited to, the table name E(weight_2) of the data sub-table E(weight_2), the attribute E(weight_2) of the data sub-table E(weight_2), the attribute value of attribute E(weight_2) ranging from 60 to 80, and the number of times the weight of each value in the data sub-table E(weight_2) appears. Specifically, 60 appears once, 65 appears once, and 66 appears once.
[0094] The distribution information may include, but is not limited to, the table name E(level_1) of data sub-table E(level_1), and the attribute E(level_1) of data sub-table E(level_1). The attribute value of attribute E(level_1) ranges from overweight, normal, and good, as well as the number of times the weight of each value in data sub-table E(level_1) appears. Specifically, overweight appears once, normal appears once, and good appears once. The distribution information may also include, but is not limited to, the table name E(level_2) of data sub-table E(level_2), and the attribute E(level_2) of data sub-table E(level_2). The attribute value of E(level_2) ranges from thin and very thin, as well as the number of times the weight of each value in data sub-table E(level_2) appears. Specifically, thin and very thin each appear once.
[0095] Optionally, as an alternative implementation, the above-described data table query method can be applied to, but is not limited to, methods such as... Figure 8 In the environment shown, the data table query method in this application embodiment can be implemented through, but is not limited to, the following steps:
[0096] Step S801: Obtain a data table query request on the client 102. The data table query request is used to request the query of the attribute value of the second attribute (e.g., grade) that matches the attribute value of the first attribute (e.g., score) in the second data table (e.g., data table E(child)) on the server 104. The first attribute is an attribute included in the first data table (e.g., data table E(stu)) on the server 104.
[0097] Step S802: Convert the data table query request into a first set of data sub-table query requests. The first set of data sub-table query requests is used to request the attribute value of the second attribute (e.g., grade) that matches the attribute value of the first attribute (e.g., score) in the second set of data sub-tables (which may include, but are not limited to, data sub-tables E(child_1) and E(child_2)). The first set of data sub-tables is obtained by splitting the first data table (e.g., data table E(stu)) according to the attribute value of the first attribute (e.g., score). The second set of data sub-tables is obtained by splitting the second data table (e.g., data table E(child)) according to the attribute value of the second attribute (e.g., grade).
[0098] Optionally, the first set of data sub-table query requests may include, but is not limited to, data sub-table query request 1 and data sub-table query request 2. Data sub-table query request 1 may be used, but is not limited to, to request the query in data sub-table E(child_1) and / or data sub-table E(child_2) for the attribute value of the second attribute (e.g., grade) that matches the attribute value of the first attribute (e.g., score) of data sub-table E(stu_1). Data sub-table query request 2 may be used, but is not limited to, to request the query in data sub-table E(child_1) and / or data sub-table E(child_2) for the attribute value of the second attribute (e.g., grade) that matches the attribute value of the first attribute (e.g., score) of data sub-table E(stu_2).
[0099] Step S803: Generate a second set of data sub-table query requests, wherein the second set of data sub-table query requests is different from the first set of data sub-table query requests. The second set of data sub-table query requests may include, but is not limited to, data sub-table query request 3 and data sub-table query request 4.
[0100] Optionally, the second set of data sub-table query requests may include, but is not limited to, data sub-table query request 3 and data sub-table query request 4. Data sub-table query request 3 may be used to request the attribute value (e.g., grade) in data sub-table E(level_1) that matches the attribute value of the first attribute (e.g., score) in data sub-table E(stu_1). Data sub-table query request 4 may be used to request the attribute value (e.g., grade) in data sub-table E(child_2) that matches the attribute value of the second attribute (e.g., grade) in data sub-table E(weight_1), and so on.
[0101] Step S804: Send a first set of data sub-table query request and a second set of data sub-table query request to server 104, and obtain the first set of query results corresponding to the first set of data sub-table query request and the second set of query results corresponding to the second set of data sub-table query request sent by the server.
[0102] Through the above steps, the data table query request obtained from the client is converted into a first set of data sub-table query requests, and a second set of data sub-table query requests is generated. On the one hand, randomly sending the first and second sets of data sub-table query requests to the server can interfere with the sending order of the query requests in the first set of data sub-table query requests. On the other hand, the query results of the second set of data sub-table query requests can interfere with the frequency statistics performed on the query results of the first set of data sub-table query requests. The returned second set of query results is unrelated to the first set of query results. In this way, the statistical analysis based on the frequency of the occurrence of encrypted data in the query results of the first set of data sub-table query requests is confused, thereby achieving the technical effect of improving the security of data table queries and solving the technical problem of low security in data table queries.
[0103] Optionally, in this embodiment, the terminal device can be a terminal device configured with a target client, which may include, but is not limited to, at least one of the following: mobile phone (such as Android phone, iOS phone, etc.), laptop computer, tablet computer, PDA, MID (Mobile Internet Devices), PAD, desktop computer, smart TV, etc. The target client may be a video client, instant messaging client, browser client, educational client, etc. The network may include, but is not limited to, wired network and wireless network, wherein the wired network includes: local area network, metropolitan area network and wide area network, and the wireless network includes: Bluetooth, WIFI and other networks that enable wireless communication. The server may be a single server, a server cluster composed of multiple servers, or a cloud server. The above is only an example, and no limitation is made in this embodiment.
[0104] Optionally, the above-mentioned data table query method can be executed by one of the devices, server 104 and client 102, alone, or by at least both of them. As an optional implementation, taking the data table query method in this embodiment executed by client 102 as an example, Figure 9 This is a flowchart illustrating an optional data table query method according to an embodiment of this application, such as... Figure 9 As shown, the query method for this data table may include the following steps:
[0105] Step S902: Obtain a data table query request, wherein the data table query request is used to request a query in a second data table on the server for the attribute value of a second attribute that matches the attribute value of a first attribute, and the first attribute is an attribute included in the first data table on the server.
[0106] When a client obtains an unencrypted data table, to improve data security, the data in the table (such as attributes, table names, etc.) is often encrypted. The encryption methods have been explained in detail in the application scenario section of this application's embodiments and will not be repeated here. The encrypted data can be stored on a server, but is not limited to this. When it is necessary to access data in the encrypted data table stored on the server, access can be made, but is not limited to, through the unencrypted attributes and table name of the data table. Specifically, a data table query request can be obtained on the client, but is not limited to this. Optionally, the data table query request carries the unencrypted table name of the second data table, the unencrypted attribute name of the second attribute, and the unencrypted table name and the unencrypted attribute name of the first attribute of the first data table.
[0107] For example, a data table query request might include the unencrypted table name "stu" of table E(stu) (equivalent to the first data table), the unencrypted attribute score (equivalent to the first attribute) of table E(stu), and the unencrypted table name "child" of table E(child) (equivalent to the second data table), and the unencrypted grade (equivalent to the second attribute) of table E(child). This method avoids the leakage of the encrypted table name and attribute names, thus improving the security of the data in the table.
[0108] Optionally, in this embodiment, the attribute value of the second attribute that matches the attribute value of the first attribute can be obtained in the following ways, but not limited to: obtaining a first set of hash values of the attribute value of the first attribute; obtaining a second set of hash values of the attribute value of the first attribute; searching for hash values that are the same as the first hash value from the second set of hash values one by one, wherein the first hash value is a hash value in the first set of hash values; if a second hash value that is the same as the first hash value is found from the second set of hash values, determining that the attribute value of the first attribute corresponding to the first hash value matches the attribute value of the second attribute corresponding to the second hash value.
[0109] Step S904: Convert the data table query request into a first set of data sub-table query requests. The first set of data sub-table query requests is used to request the query in the second set of data sub-tables to find the attribute value of the second attribute that matches the attribute value of the first attribute in the first set of data sub-tables. The first set of data sub-tables is a data sub-table obtained by splitting the first data table according to the attribute value of the first attribute, and the second set of data sub-tables is a data sub-table obtained by splitting the second data table according to the attribute value of the second attribute.
[0110] When the client receives a data table query request, it may, but is not limited to, convert the data table query request into a first set of data sub-table query request. Optionally, but not limited to, it may, from the sub-table mapping table stored in the client, obtain the first set of data sub-tables and the attributes of each data sub-table in the first set of data sub-tables corresponding to the first data table and the first attribute carried in the data table query request; and obtain the second set of data sub-tables and the attributes of each data sub-table in the second set of data sub-tables corresponding to the second data table and the second attribute carried in the data table query request; and use the first set of data sub-tables and the attributes of each data sub-table in the first set of data sub-tables, as well as the attributes of each data sub-table in the second set of data sub-tables, to generate the first set of data sub-table query request.
[0111] For example, a data table query request might include the unencrypted table name `stu` of data table E(stu) (equivalent to the first data table), the score (equivalent to the first attribute) in data table E(stu), and the unencrypted table name `child` of data table E(child) (equivalent to the second data table), and the grade (equivalent to the second attribute) in data table E(child). In such a case, it is possible, but not limited to, to retrieve the encrypted table names (e.g., E(stu_1) and E(stu_2), E(stu_1) and E(stu_2)) of the corresponding sub-tables from the sub-table mapping table stored on the client side. u_2) is equivalent to the first set of data sub-tables), and the encryption attributes of the data sub-tables corresponding to the scores in data table E(stu) can be, but are not limited to, E(score_1) and E(score_2) respectively; and the encryption table names of the data sub-tables corresponding to the unencrypted table name child of data table E(child) are obtained (e.g., E(child_1) and E(child_2), where E(child_1) and E(child_2) are equivalent to the second set of data sub-tables), and the encryption attributes of the data sub-tables corresponding to the levels in data table E(child) can be, but are not limited to, E(level_1) and E(level_2) respectively.
[0112] Optionally, the query request for the first set of data sub-tables carries the encrypted table name of the first set of data sub-tables and the encryption attribute of each data sub-table in the first set of data sub-tables. That is, E(stu_1) and E(stu_2), E(score_1) and E(score_2), E(child_1) and E(child_2), and E(level_1) and E(level_2) are all encrypted.
[0113] By using the above method, data query requests carrying unencrypted table names and attributes are transformed into a set of data sub-table query requests carrying encrypted data sub-table names and attributes. This avoids the potential leakage of encrypted table names and attributes during data table access, improving the security of data access in the data table. Furthermore, by using table partitioning, queries can be performed in parallel on multiple data sub-tables, reducing the time required for queries and improving query efficiency.
[0114] Step S906: Generate a second set of data sub-table query requests, wherein the second set of data sub-table query requests is different from the first set of data sub-table query requests.
[0115] Optionally, when converting a data table query request into a first set of data sub-table query requests, a second set of data sub-table query requests may be generated, but is not limited to. The second set of data sub-table query requests is different from the first set of data sub-table query requests. It is understood that the second set of data sub-table query requests is used to interfere with the frequency statistics of the first set of data sub-table query requests (such as query order statistics, query count statistics, and query frequency statistics, etc.).
[0116] In detail, a passive persistent adversary can observe all encrypted accesses on the server, but will not actively execute its own accesses. On the one hand, by generating a second set of data sub-table query requests that differ from the first set, the query frequency statistics of the data sub-tables in the first set of data sub-table query requests can be obfuscated by passive persistence adversaries. For example, when a passive persistence adversary attacks by analyzing the data access frequency in the data sub-table join, the data frequencies of deterministic ciphertext data a, b, and c in data table A are the same, N(a) = N(b) = N(c). The frequency of sub-table joins between tables A and A' during the join process is also the same, N(a,a′) = N(b,b′) = N(c,c′), where c′ is ciphertext data with attribute values close to the range of deterministic ciphertext data c. In this way, passive persistence adversaries are prevented from deducing the encrypted data sub-tables belonging to the same data table, and thus from deducing the most frequent letters or letter combinations in the encrypted data sub-tables, thereby guessing the key used in the encryption algorithm and leading to data leakage, thus improving the security of accessing the data table.
[0117] Step S908: Send the first group of data sub-table query request and the second group of data sub-table query request to the server, and obtain the first group of query results corresponding to the first group of data sub-table query request and the second group of query results corresponding to the second group of data sub-table query request sent by the server.
[0118] Optionally, in this embodiment, when the data table query request is converted into a first set of data sub-table query requests and a second set of data sub-table query requests is generated, the first set of data sub-table query requests and the second set of data sub-table query requests can be sent to the server, and the first set of query results corresponding to the first set of data sub-table query requests and the second set of query results corresponding to the second set of data sub-table query requests sent by the server can be obtained. Optionally, the second set of query results can be used to interfere with the frequency statistics performed on the first set of query results.
[0119] For example, if a passively persistent adversary already knows the first attribute (e.g., score) and the second attribute (e.g., grade) used for matching, even though the data sub-tables and their attributes accessed on the server are encrypted, and the distribution of scores and grades follows a normal distribution (meaning scores of 60-90 appear frequently, and grades of "good" and "pass" appear frequently), in deterministic encryption algorithms, a passively persistent adversary might identify frequently occurring identical data in the server's query results as "good" and "pass," and then analyze the keys used for "good" and "pass," leading to the leakage of plaintext data. However, through the data table query method in this embodiment, the server returns a first set of query results and a second set of query results. Therefore, the distribution of scores and grades in the first set of query results may no longer follow a normal distribution, thus interfering with the passively persistent adversary's frequency statistics on the first set of query results and improving the security of the query results returned by the server.
[0120] By using the above method, the data table query request obtained from the client is converted into a first set of data sub-table query requests, and a second set of data sub-table query requests is generated. On the one hand, randomly sending the first and second sets of data sub-table query requests to the server can interfere with the sending order of the query requests in the first set of data sub-table query requests. On the other hand, the query results of the second set of data sub-table query requests can interfere with the frequency statistics performed on the query results of the first set of data sub-table query requests. The returned second set of query results is unrelated to the first set of query results. In this way, the statistical analysis based on the frequency of the occurrence of encrypted data in the query results of the first set of data sub-table query requests is confused, thereby achieving the technical effect of improving the security of data table queries and solving the technical problem of low security in data table queries.
[0121] As an optional approach, a second set of data sub-table query requests is generated, including:
[0122] S11, Generate the second group of data sub-table query request, wherein the second group of data sub-table query request is used to request the query in the second group of data sub-tables to find the attribute value of the second attribute that matches the attribute value of the third attribute in the third group of data sub-tables, wherein the third group of data sub-tables is a data sub-table obtained by splitting the third data table on the server according to the attribute value of the third attribute; or
[0123] S12, generate the second group of data sub-table query request, wherein the second group of data sub-table query request is used to request the query in the fourth group of data sub-tables for the attribute value of the fourth attribute that matches the attribute value of the first attribute in the first group of data sub-tables, wherein the fourth group of data sub-tables is a data sub-table obtained by splitting the fourth data table on the server according to the attribute value of the fourth attribute; or
[0124] S13, generate the second group of data sub-table query request, wherein the second group of data sub-table query request is used to request to query the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third group of data sub-table in the fourth group of data sub-table.
[0125] To obfuscate the access patterns of the data sub-tables, a second set of data sub-table query requests can be generated, but is not limited to. Optionally, in this embodiment, such as... Figure 10 As shown, the second set of data sub-table query requests are used to query the attribute value of the second attribute (e.g., grade) in the second set of data sub-tables (which may include, but are not limited to, data sub-tables E(child_1) and E(child_2)) that matches the attribute value of the third attribute (e.g., weight) in the third set of data sub-tables (which may include, but are not limited to, data sub-tables E(weight_1) and E(weight_2)). The third set of data sub-tables is a data sub-table obtained by splitting the third data table (e.g., data table E(weight)) on the server according to the attribute value of the third attribute (e.g., weight).
[0126] Optionally, in this embodiment, such as Figure 11 As shown, the second set of data sub-table query requests are used to request the query in the fourth set of data sub-tables (which may include, but are not limited to, data sub-tables E(level_1) and E(level_2)) to find the attribute value of the fourth attribute (e.g., grade) that matches the attribute value of the first attribute (e.g., score) in the first set of data sub-tables (which may include, but are not limited to, data sub-tables E(level_1) and E(level_2)). The fourth set of data sub-tables (which may include, but are not limited to, data sub-tables E(level_1) and E(level_2)) is a data sub-table obtained by splitting the fourth data table (e.g., data table E(level)) on the server according to the attribute value of the fourth attribute (e.g., grade).
[0127] Optionally, in this embodiment, the second set of data sub-table query request can also be used to request the query in the fourth set of data sub-tables (which may include, but are not limited to, data sub-tables E(level_1) and E(level_2)) for the attribute value of the fourth attribute (e.g., level) that matches the attribute value of the third attribute (e.g., weight) in the third set of data sub-tables (which may include, but are not limited to, data sub-tables E(weight_1) and E(weight_2)).
[0128] Through the above steps, the second set of data sub-table query requests can be used to interfere with statistical analysis of the query order and frequency of the data sub-tables queried in the first set of data sub-table query requests. For example, the first set of data sub-table query requests includes (a1,b1), (a1,b2), and (a1,b3). (a1,b1) represents querying the attribute values in the attribute values of the attribute in data sub-table b1 that match the attribute values in data sub-table a1; (a1,b2) represents querying the attribute values in the attribute values of the attribute in data sub-table b2 that match the attribute values in data sub-table a1; and (a1,b3) represents querying the attribute values in the attribute values of the attribute in data sub-table b3 that match the attribute values in data sub-table a1.
[0129] To ensure the randomness of sub-table a1, a sub-table c1 with attribute value ranges close to those of the actual sub-table a1 needs to be randomly selected. This ensures that when performing sub-table join queries (a1, b1), (a1, b2), and (a1, b3), either sub-table a1 or c1 is selected with a 1 / 2 probability each time. Similarly, to ensure the randomness of sub-table b1, a sub-table d1 with attribute value ranges close to those of the actual sub-table b1 needs to be randomly selected. This ensures that when performing sub-table join queries (a1, b1), either sub-table b1 or d1 is selected with a 1 / 2 probability each time. The randomness of sub-tables b2 and b3 can be ensured in the same way, which will not be elaborated here. This method obscures the access frequency of sub-table a1, preventing adversaries from inferring which sub-tables belong to the same table based on the frequency information. This technology interferes with the frequency statistics of passive persistent adversaries' query requests to the first set of data sub-tables, thereby improving the security of sending query requests to the server.
[0130] It should be noted that the number of query requests for the second set of data sub-tables that can be generated may be very large. Considering the actual processing performance of the server and client, if the number of query requests for the second set of data sub-tables is less than or equal to the threshold, the number of query requests for the second set of data sub-tables can be left unrestricted. If the number of query requests for the second set of data sub-tables exceeds the threshold, query requests for data sub-tables that are less than or equal to the threshold are randomly selected from all the generated query requests for the second set of data sub-tables. In this way, the performance utilization of both the client and the server is improved.
[0131] As an optional approach, the above method also includes:
[0132] S21, on the client, search for the identifier of the data table that satisfies the first matching condition with the first data table to obtain the identifier of the third data table, and on the client, obtain the identifier of the third group of data sub-tables that have a mapping relationship with the third data table, wherein the first matching condition includes a third value range that is at least partially the same as the first value range, the first value range is the value range of the attribute value of the first attribute in the first data table, and the third value range is the value range of the attribute value of the third attribute in the third data table; or
[0133] S22, on the client, search for the identifier of the data sub-table that satisfies the second matching condition with the data sub-table in the first group of data sub-tables, and obtain the identifier of the third group of data sub-tables. The second matching condition includes that the third value sub-range is at least partially the same as the first value sub-range. The first value sub-range is the value range of the attribute value of the first attribute in the data sub-table in the first group of data sub-tables, and the third value sub-range is the value range of the attribute value of the third attribute in the data sub-table in the third group of data sub-tables.
[0134] To increase the similarity between the data in the third data table and the data in the first data table, it is possible, but not limited to, searching for the identifier of the data table that meets the first matching condition in the distribution information stored in the client (which may include, but is not limited to, the table name, code, etc. of the data table), and obtaining the identifier of the third set of data sub-tables that have a mapping relationship with the third data table on the client (which may include, but is not limited to, the table name, code, etc. of the data sub-table).
[0135] For example: on the client side, find the identifier of the data table that meets the first matching condition with the first data table (e.g., data table E(stu)), obtain the identifier of the third data table (e.g., data table E(weight)), and on the client side, obtain the identifiers of the third set of data sub-tables (e.g., data sub-table E(weight_1) and data sub-table E(weight_2)) that have a mapping relationship with the third data table (e.g., data table E(weight)).
[0136] The range of values for the third attribute (e.g., weight) in the third data table (e.g., data table E(weight)) is at least partially the same as the range of values for the first attribute (e.g., score) in the first data table (e.g., data table E(stu)). This means that at least one value in the third attribute (e.g., weight) of the third data table (e.g., data table E(weight)) is identical to a value in the first attribute (e.g., score) of the first data table (e.g., data table E(stu)), and the frequency of this identical attribute value in the third data table is different from the frequency of its occurrence in the first data table. This increases the similarity between the data in the third data table and the data in the first data table. Given that the adversary knows the two attributes used for connection, they cannot distinguish between the first and third data tables, thus preventing the decryption of plaintext data within encrypted data and reducing the probability of data leakage.
[0137] As an optional approach, the above method also includes:
[0138] S31, the first matching condition further includes: the first number and the third number are different, wherein the first number is the number of times the first value appears in the first attribute of the first data table, the third number is the number of times the first value appears in the third attribute of the third data table, and the first value is a value in the third value range that is the same as the first value range; or, the first ratio and the third ratio are different, wherein the first ratio is the ratio obtained by dividing the first number by the first total number, the first total number is the sum of the number of times each value in the first value range appears in the first attribute of the first data table, and the third ratio is the ratio obtained by dividing the third number by the third total number, and the third total number is the sum of the number of times each value in the third value range appears in the third attribute of the third data table; or
[0139] S32, the second matching condition further includes: the first sub-count and the third sub-count are different, wherein the first sub-count is the number of times the third value appears in the first attribute of one data sub-table in the first group of data sub-tables, and the third sub-count is the number of times the third value appears in the third attribute of another data sub-table in the third group of data sub-tables, and the third value is the same value in the third value sub-range as the first value sub-range; or, the first sub-ratio and the third sub-ratio are different, wherein the first sub-ratio is the ratio obtained by dividing the first sub-count by the total count of the first sub-table, and the total count of the first sub-table is the sum of the counts of each value in the first value sub-range appearing in the first attribute of one data sub-table, and the third sub-ratio is the ratio obtained by dividing the third sub-count by the total count of the third sub-table, and the total count of the third sub-table is the sum of the counts of each value in the third value range appearing in the third attribute of another data sub-table.
[0140] Optionally, in this embodiment, before searching for the identifier of the third data table that satisfies the first matching condition with the first data table on the client, the method further includes: obtaining on the client the value range of the attribute value of the first attribute in the first data table sent by the server, and the identifier of the first data table and the identifier of the first group of data sub-tables with a mapping relationship; and obtaining on the client the value range of the attribute value of the third attribute in the third data table sent by the server, and the identifier of the third data table and the identifier of the third group of data sub-tables with a mapping relationship; or
[0141] Before searching on the client for the identifier of the third data sub-table that satisfies the second matching condition with the first data sub-table, the method further includes: obtaining on the client the value range of the first attribute in the data sub-table of the first data sub-table sent by the server, as well as the identifier of the first data table with a mapping relationship and the identifier of the first data sub-table; and obtaining on the client the value range of the third attribute in the data sub-table of the third data sub-table sent by the server, as well as the identifier of the third data table with a mapping relationship and the identifier of the third data sub-table.
[0142] To better understand the data table query method in the embodiments of this application, the first matching condition and the second matching condition in the embodiments of this application will be explained and described below in conjunction with optional embodiments, which may be applied to the embodiments of this application but are not limited to them.
[0143] like Figure 12As shown, the first matching condition also includes: the first number and the third number are different. The first number (e.g., 2 times) is the number of times the first value (e.g., 80) appears in the first attribute (e.g., score) in the first data table (e.g., data table E(stu)). The third number (e.g., 3 times) is the number of times the first value (e.g., 80) appears in the third attribute (e.g., weight) in the third data table (e.g., data table E(weight)). The first value (e.g., 80) is a value that is the same in the third value range as the first value range.
[0144] The second matching condition also includes: the first sub-count and the third sub-count are different, wherein the first sub-count is the number of times the third value appears in the first attribute of a data subtable in the first group of data subtables, the third sub-count is the number of times the third value appears in the third attribute of another data subtable in the third group of data subtables, and the third value is the same value in the third value sub-range as the first value sub-range; or
[0145] like Figure 13 As shown, the second matching condition also includes: the first sub-ratio is different from the third sub-ratio. The first sub-ratio is the ratio (e.g., 1 / 2) obtained by dividing the first sub-frequency (e.g., the number of times the score 80 appears in data sub-table E(stu_1') is 2 times) by the total frequency of the first sub-table (e.g., the number of times the score of all scores in data sub-table E(stu_1') appears 4 times). The total frequency of the first sub-table is the sum of the number of times each value in the first value sub-range appears in the first attribute (e.g., score) in a data sub-table (e.g., data sub-table E(stu_1')). The third sub-ratio is the ratio (e.g., 1) obtained by dividing the third sub-frequency (e.g., the number of times the weight 80 appears in data sub-table E(weight_1) is 3 times) by the total frequency of the third sub-table (e.g., the number of times the weight of each value in data sub-table E(weight_1) appears 3 times).
[0146] In this way, the data tables that satisfy the first or second matching condition have similar frequencies to the data in the actual data table that is intended to be queried. This achieves interference with the statistics of the query frequency (e.g., number of queries, query order, and query frequency) of the data tables or data sub-tables requested by a passive persistent adversary for the first set of data sub-tables. For example, when the adversary analyzes which data sub-tables belong to the original table based on the query frequency of the data sub-tables. For a data table stu, its data sub-tables include E(stu_1') and E(stu_2). The number of queries performed on data table E(stu_1') is sum(stu_1') = N(stu_1') + k1, where N(stu_1') is the actual number of queries that should be performed, and k1 is the introduced noise, thus achieving frequency fuzzing of E(stu_1'). The number of queries performed on data table E(child1) is sum(child1) = N(child1) + k1′, where N(child1) is the actual number of queries that should be performed, and k1′ is the noise introduced. Frequency fuzzing is applied to E(child1).
[0147] Furthermore, for a fixed sequence of joins between original data tables and extended joins between sub-tables, such as when querying for attribute values in data table C that match attribute values in data table A, the total number of queries is Sum = {N(A1,C1) + N(A2,C1) + N(A2,C2) + k}. Here, data table A includes sub-tables A1 and A2, and data table C includes sub-tables C1, C2, and C3. The join sequence between the sub-tables of data table C and data table A introduces a noise k, the size of which is random. This makes it difficult for an adversary to determine the size of the original sequence and thus infer the sub-tables included in the original table. By introducing noise, the access frequency of each sub-table is randomized, protecting the data from statistical analysis attacks.
[0148] As an optional approach, the above method also includes:
[0149] S41, on the client, search for the identifier of the data table that satisfies the third matching condition with the second data table to obtain the identifier of the fourth data table, and on the client, obtain the identifier of the fourth group of data sub-tables that have a mapping relationship with the fourth data table, wherein the third matching condition includes a fourth value range that is at least partially the same as the second value range, the second value range is the value range of the attribute value of the second attribute in the second data table, and the fourth value range is the value range of the attribute value of the fourth attribute in the fourth data table; or
[0150] S42, on the client, search for the identifier of the data sub-table that satisfies the fourth matching condition with the data sub-table in the second group of data sub-tables, and obtain the identifier of the fourth group of data sub-tables, wherein the fourth matching condition includes a fourth value sub-range that is at least partially the same as the second value sub-range, the second value sub-range includes the value range of the attribute value of the second attribute in the data sub-table in the second group of data sub-tables, and the fourth value sub-range includes the value range of the attribute value of the fourth attribute in the data sub-table in the fourth group of data sub-tables.
[0151] To increase the similarity between the data in the fourth data table and the data in the second data table, it is possible, but not limited to, searching for the identifier of the data table that meets the third matching condition of the second data table in the distribution information stored in the client (which may include, but is not limited to, the table name, code, etc.) to obtain the identifier of the fourth data table, and obtaining the identifier of the fourth set of data sub-tables that have a mapping relationship with the fourth data table from the distribution information stored in the client (which may include, but is not limited to, the table name, code, etc. of the data sub-tables).
[0152] For example: find the identifier of the data table that meets the third matching condition with the second data table (e.g., data table E(child)) from the distribution information stored on the client, obtain the identifier of the fourth data table (e.g., data table E(level)), and obtain the identifier of the fourth set of data sub-tables (e.g., data sub-table E(level_1) and data sub-table E(level_2)) that have a mapping relationship with the fourth data table (e.g., data table E(level)) from the distribution information stored on the client.
[0153] The range of values for the fourth attribute (e.g., level) in the fourth data table (e.g., data table E(level)) is at least partially the same as the range of values for the first attribute (e.g., level) in the second data table (e.g., data table E(child)). It can be understood that at least one of the attribute values for the fourth attribute (e.g., level) in the fourth data table (e.g., data table E(level)) is the same as the attribute value for the first attribute (e.g., level) in the second data table (e.g., data table E(child)), and the frequency of the same attribute value appearing in the fourth data table is different from the frequency of the same attribute value appearing in the second data table.
[0154] In this way, the similarity between the data in the fourth data table and the data in the second data table is increased. Under the premise that the adversary knows the two attributes of the connection, it is impossible to distinguish between the second data table and the fourth data table, and thus it is impossible to crack the plaintext data in the encrypted data, thereby improving the security of the encrypted data stored on the server.
[0155] As an optional approach, the above method also includes:
[0156] S51, the third matching condition further includes: the second number and the fourth number are different, wherein the second number is the number of times the second value appears in the second attribute of the second data table, the fourth number is the number of times the second value appears in the fourth attribute of the fourth data table, and the second value is a value in the fourth value range that is the same as the value in the second value range; or, the second ratio and the fourth ratio are different, wherein the second ratio is the ratio obtained by dividing the second number by the second total number, the second total number is the sum of the number of times each value in the second value range appears in the second attribute of the second data table, and the fourth ratio is the ratio obtained by dividing the fourth number by the fourth total number, and the fourth total number is the sum of the number of times each value in the fourth value range appears in the fourth attribute of the fourth data table; or
[0157] S52, the fourth matching condition further includes: the second sub-number and the fourth sub-number are different, wherein the second sub-number is the number of times the fourth value appears in the second attribute in one data sub-table of the second group of data sub-tables, and the fourth sub-number is the number of times the second value appears in the fourth attribute in another data sub-table of the fourth group of data sub-tables, and the fourth value is the same value in the fourth value sub-range as the second value sub-range; or, the second sub-ratio and the fourth sub-ratio are different, wherein the second sub-ratio is the ratio obtained by dividing the second sub-number by the total number of times in the second sub-table, the total number of times in the second sub-table is the sum of the number of times each value in the second value sub-range appears in the second attribute in one data sub-table, and the fourth sub-ratio is the ratio obtained by dividing the fourth sub-number by the total number of times in the fourth sub-table, and the total number of times in the fourth sub-table is the sum of the number of times each value in the fourth value range appears in the fourth attribute in another data sub-table.
[0158] To better understand the data table query method in the embodiments of this application, the third and fourth matching conditions in the embodiments of this application will be explained and described below in conjunction with optional embodiments, which may be applied to the embodiments of this application, but are not limited to.
[0159] like Figure 14As shown, the third matching condition also includes: the second proportion (e.g., the proportion of the "Good" grade appearing in data table E(child) is 1 / 4) is different from the fourth proportion (e.g., the proportion of the "Good" grade appearing in data table E(level) is 1 / 5). The second proportion is obtained by dividing the second number (e.g., the number of times the "Good" grade appears in data table E(child) is 1) by the second total number (e.g., the sum of the number of times the grade appears for all values in data table E(child) is 5). The second total number is the sum of the number of times each value in the second range of values appears. The fourth proportion is the sum of the number of times the value appears in the second attribute (e.g., level) in the second data table (e.g., data table E(child)). The fourth proportion is the ratio of the fourth number of occurrences (e.g., the number of times the level is good in data table E(level) is 1) divided by the fourth total number of occurrences (e.g., the sum of the number of times the level appears in all values in data table E(level) is 5). The fourth total number of occurrences is the sum of the number of times each value in the fourth value range appears in the fourth attribute (e.g., level) in the fourth data table (e.g., data table E(child)).
[0160] Or, such as Figure 15 As shown, the fourth matching condition also includes: the second sub-ratio (e.g., 1 / 2) is different from the fourth sub-ratio (e.g., 1 / 3). The second sub-ratio is the ratio obtained by dividing the second sub-number (e.g., the number of times the grade "good" appears in data sub-table E(child_1) once) by the total number of times in the second sub-table (e.g., the sum of the number of times the grade appears in all values in data sub-table E(child_1) twice). The total number of times in the second sub-table is the sum of the number of times each value in the second value sub-range appears in the second attribute of a data sub-table. The fourth sub-ratio is the ratio obtained by dividing the fourth sub-number (e.g., the number of times the grade "good" appears in data sub-table E(level_1) once) by the total number of times in the fourth sub-table (e.g., the sum of the number of times the grade appears in all values in data sub-table E(level_1) three times).
[0161] In this way, the similarity between the data in the fourth data table and the data in the second data table is increased (e.g., the same number of times the same value appears, the same proportion of occurrence, or within the same value range). Under the premise that the adversary knows the two attributes of the connection, it is impossible to distinguish between the second data table and the fourth data table, which greatly increases the difficulty for the adversary to crack the plaintext data in the encrypted data.
[0162] As an optional approach, the above method also includes:
[0163] S61, when the second group of data sub-table query request is used to request the query in the second group of data sub-table for the attribute value of the second attribute that matches the attribute value of the third attribute in the third group of data sub-table, the first group query result includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the second attribute that matches the attribute value of the first attribute in the first group of data sub-table, as found in the second group of data sub-table; the second group query result includes encrypted data of the attribute value of the third attribute in the third group of data sub-table, and encrypted data of the attribute value of the second attribute that matches the attribute value of the third attribute in the third group of data sub-table, as found in the second group of data sub-table; or
[0164] S62, when the second group of data sub-table query request is used to request the query in the fourth group of data sub-table for the attribute value of the fourth attribute that matches the attribute value of the first attribute in the first group of data sub-table, the first group query result includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the second attribute that matches the attribute value of the first attribute in the first group of data sub-table, as found in the second group of data sub-table; the second group query result includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the fourth attribute that matches the attribute value of the first attribute in the first group of data sub-table, as found in the fourth group of data sub-table; or
[0165] S63, when the second group of data sub-table query request is used to request the query in the fourth group of data sub-table for the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third group of data sub-table, the first group query result includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the second attribute that matches the attribute value of the first attribute in the first group of data sub-table, which is found in the second group of data sub-table. The second group query result includes encrypted data of the attribute value of the third attribute in the third group of data sub-table, and encrypted data of the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third group of data sub-table, which is found in the fourth group of data sub-table.
[0166] Optionally, in this embodiment, when the server sends a first set of query results corresponding to the first set of data sub-table query request and a second set of query results corresponding to the second set of data sub-table query request, the query results with the first request identifier can be obtained from the query results returned by the server as the first set of query results, and the query results with the second request identifier can be obtained from the query results returned by the server as the second set of query results, and the second set of query results can be deleted, based on the request identifier carried in the first set of query results and the second set of query results. The first request identifier is the request identifier corresponding to the first set of data sub-table query request, and the second request identifier is the request identifier corresponding to the second set of data sub-table query request.
[0167] Optionally, but not limited to, a first request identifier may be generated for each query request in the first group of data sub-table query requests, and the generated first request identifier may be carried in the first group of data sub-table query requests; and a second request identifier may be generated for each query request in the second group of data sub-table query requests, and the generated second request identifier may be carried in the second group of data sub-table query requests.
[0168] To better understand the first set of query results and the second set of query results in the embodiments of this application, the first set of query results and the second set of query results in the embodiments of this application will be explained and described below in conjunction with optional embodiments, which may be applied to the embodiments of this application but are not limited to them.
[0169] like Figure 16As shown, when the second set of data sub-table query requests the query in the second set of data sub-tables (e.g., data sub-table E(child_1) and data sub-table E(child_2)) to retrieve the attribute value of the second attribute (e.g., rank) that matches the attribute value of the third attribute (e.g., weight) in the third set of data sub-tables (e.g., data sub-table E(weight_1) and data sub-table E(weight_2)), the first set of query results includes encrypted data of the attribute value of the first attribute (e.g., score) in the first set of data sub-tables (e.g., data sub-table E(child_1') and data sub-table E(child_2)), and encrypted data of the attribute value of the first attribute (e.g., score) in the second set of data sub-tables (e.g., data sub-table E(child_1') and data sub-table E(child_2)). The second set of query results includes encrypted data of the attribute values of the second attribute (e.g., grade) that match the attribute values of the first attribute (e.g., score) in the first set of data sub-tables (e.g., data sub-tables E(stu_1') and E(stu_2)). The second set of query results includes encrypted data of the attribute values of the third attribute in the third set of data sub-tables (e.g., data sub-tables E(weight_1') and E(weight_2)), as well as encrypted data of the attribute values of the second attribute (e.g., grade) that match the attribute values of the third attribute (e.g., weight) in the third set of data sub-tables.
[0170] In detail, the first set of query results includes the grades corresponding to the scores in data sub-tables E(stu_1') and E(stu_2) in data sub-tables E(child_1) and E(child_2), as well as the data identifiers (E(1) or E(0), etc.). For example, in data sub-table E(stu_1'), a score of 80 corresponds to a grade of "Good," and a score of 90 corresponds to a grade of "Excellent." In data sub-table E(stu_2), a score of 70 corresponds to a grade of "Pass," a score of 60 corresponds to a grade of "Pass," and a score of 65 corresponds to a grade of "Pass."
[0171] The second set of query results includes the weights in data sub-tables E(weight_1) and E(weight_2) corresponding to the grades in data sub-tables E(child_1) and E(child_2), as well as the data identifiers (e.g., E(1) or E(0), etc.). For example, the grade corresponding to weight 80 in data sub-table E(weight_1) is "Good". The grade corresponding to weight 66 in data sub-table E(weight_2) is "Acceptable", the grade corresponding to weight 60 is "Acceptable", and the grade corresponding to weight 65 is "Acceptable".
[0172] Alternatively, if a query request in the second set of sub-tables requests a query in the fourth set of sub-tables (e.g., sub-tables E(level_1) and E(level_2)) for a fourth attribute (e.g., grade) that matches the attribute value of the first attribute (e.g., score) in the first set of sub-tables (e.g., sub-tables E(stu_1') and E(stu_2)), then the query result in the first set includes the attribute values of the fourth attribute (e.g., grade) in the first set of sub-tables (e.g., sub-tables E(stu_1') and E(stu_2)). The encrypted data of the attribute value of the first attribute (e.g., score) and the encrypted data of the attribute value of the second attribute (e.g., grade) that matches the attribute value of the first attribute in the first set of data sub-tables (e.g., data sub-table E(child_1) and data sub-table E(child_2)). The second set of query results includes the encrypted data of the attribute value of the first attribute in the first set of data sub-tables and the encrypted data of the attribute value of the fourth attribute that matches the attribute value of the first attribute in the first set of data sub-tables.
[0173] In detail, the second set of query results includes the grades corresponding to the scores in data sub-tables E(level_1) and E(level_2) in data sub-tables E(1) and E(0), respectively, as well as the data identifiers (E(1) or E(0), etc.). For example, in data sub-table E(1), a score of 80 corresponds to the grade of overweight, and a score of 90 corresponds to the grade of overweight. In data sub-table E(level_2), a score of 70 corresponds to the grade of normal, a score of 60 corresponds to the grade of good, and a score of 65 corresponds to the grade of good.
[0174] Alternatively, if the second set of sub-table queries is used to request a query in the fourth set of sub-tables (e.g., sub-tables E(level_1) and E(level_2)) for a fourth attribute (e.g., level) that matches the attribute value of the third attribute (e.g., weight) in the third set of sub-tables (e.g., sub-tables E(weight_1) and E(weight_2)), then the first set of query results includes the attribute value of the first attribute (e.g., score) in the first set of sub-tables (e.g., sub-tables E(stu_1') and E(stu_2)). The encrypted data includes the encrypted data of the second attribute (e.g., level) that matches the attribute value of the first attribute in the first set of data sub-tables (e.g., data sub-table E(child_1) and data sub-table E(child_2)). The second set of query results includes the encrypted data of the attribute value of the third attribute in the third set of data sub-tables (e.g., data sub-table E(weight_1) and data sub-table E(weight_2)) and the encrypted data of the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third set of data sub-tables.
[0175] In detail, the second set of query results includes: the weight of data sub-table E(weight_1) and the level corresponding to the weight of data sub-table E(weight_2) in data sub-table E(level_1) and data sub-table E(level_2), as well as the data identifier (E(1) or E(0), for example: the level corresponding to weight 80 in data sub-table E(weight_1) is overweight, the level corresponding to weight 66 in data sub-table E(weight_2) is good, the level corresponding to weight 60 is good, and the level corresponding to weight 65 is good.
[0176] In this way, the server returns both the first and second set of query results to the client. Although a passive persistence adversary can observe all encrypted accesses, the second set of query results interferes with the frequency statistics of the first set of query results. For example, the passive persistence adversary cannot know which query results are the results that the user actually wants to query (e.g., the second set of query results) and which query results are obfuscated fake query results (e.g., the second set of query results). As a result, it cannot perform frequency statistics on the query results, thus improving the security of the query results returned by the server.
[0177] As an optional approach, the above method also includes:
[0178] S71, when the first data table is a data table obtained by performing a first smoothing process on the attribute values of the first attribute in the first real data table, and the first real data table is updated to a second real data table, the first data table is updated to a fifth data table based on the first difference between the first real data table and the second real data table, wherein the first smoothing process is used to make each attribute value of the first attribute in the first real data table appear the same number of times in the first data table, and the second difference between the first data table and the fifth data table is the same as the first difference;
[0179] S72, based on the second real data table and the fifth data table, determine whether to perform a second smoothing process on the second real data table;
[0180] S73, if it is determined that the second smoothing process is to be performed on the second real data table, the second smoothing process is performed on the second real data table to obtain a sixth data table, and the first data table on the server is replaced with the sixth data table, wherein the second smoothing process is used to make the number of times each attribute value of the first attribute in the second real data table appears in the sixth data table the same.
[0181] S74, if it is determined that the second smoothing process will not be performed on the second real data table, the first data table on the server is replaced with the fifth data table.
[0182] The data obtained by the client may change, which may in turn cause changes to the data stored on the server. In such cases, the changed data may be re-smoothed, or no smoothing may be performed, to ensure the frequency security of the data on the server. Optionally, in this embodiment, if it is determined that the second real data table will not undergo the second smoothing process, it can be indicated that the frequency of occurrence of each attribute value in the second real data table is dissimilar to the frequency of occurrence of each attribute value in the fifth data table. That is, the second real data table will not reveal the frequency of occurrence of each attribute value in the fifth data table. In this case, the second real data table may be omitted from the second smoothing process, and the first data table on the server may be replaced with the fifth data table. This method greatly saves the computer resources required for smoothing the data tables and improves the utilization rate of computer resources.
[0183] Optionally, in this embodiment, if it is determined that the second real data table undergoes a second smoothing process, it can be indicated that the frequency of occurrence of each attribute value in the second real data table is similar to the frequency of occurrence of each attribute value in the fifth data table. In other words, the second real data table may reveal the frequency of occurrence of each attribute value in the fifth data table. In such a case, the second real data table can be subjected to a second smoothing process, but is not limited to, to obtain a sixth data table, and the first data table on the server can be replaced with the sixth data table. In this way, the security of the data stored on the server is greatly improved when the data in the server is dynamically changing.
[0184] To better understand the second smoothing process in the embodiments of this application, the process of the second smoothing process in the embodiments of this application will be explained and described below in conjunction with optional embodiments, which may be applied to the embodiments of this application but are not limited to them.
[0185] like Figure 17 As shown, the first real data table records the attribute values of the first attribute, which are 2, 3, 3, 4, and 5. A first smoothing process is applied to the attribute values of the first attribute in the first real data table to obtain the second real data table, which records the attribute values of the first attribute, which are 2, 2, 3, 3, 4, 4, 5, and 5. When the first real data table is updated to a second real data table, for example, when attribute values 5 and 6 are added to the first real data table, the second real data table records the attribute values of the first attribute, which are 2, 3, 3, 4, 5, 5, and 6. Based on the first difference between the first and second real data tables, the first data table is updated to a fifth data table. The fifth data table records the attribute values of the first attribute, which are 2, 2, 3, 3, 4, 4, 5, 5, 5, and 6. The second difference between the first and fifth data tables is the same as the first difference. If a second smoothing process is applied to the second real data table, it is performed to obtain a sixth data table. This sixth data table records the attribute values of the first attribute, which are 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5 and 6, 6, 6. The first data table on the server is then replaced with the sixth data table.
[0186] like Figure 18As shown, the first real data table records the attribute values of the first attribute, which are 2, 3, 3, 4, 5, 5, and 6. A first smoothing process is applied to the attribute values of the first attribute in the first real data table to obtain the second real data table, which records the attribute values of the first attribute, which are 2, 2, 3, 3, 4, 4, 5, 5, 6, and 6. When the first real data table is updated to a second real data table, for example, when the attribute value 5 is reduced in the first real data table, the second real data table records the attribute values of the first attribute, which are 2, 3, 3, 4, 5, and 6. Based on the first difference between the first and second real data tables, the first data table is updated to a fifth data table. The fifth data table records all the attribute values of the first attribute, which are 2, 2, 3, 3, 4, 4, 5, and 6. The second difference between the first and fifth data tables is the same as the first difference. If it is determined that the second real data table will undergo the second smoothing process, the second real data table will undergo the second smoothing process to obtain the sixth data table. The sixth data table records the attribute values of the first attribute, which are 2, 2, 3, 3, 4, 4, 5, 5, 6 and 6 respectively. The first data table on the server will then be replaced with the sixth data table.
[0187] As an optional approach, based on the second real data table and the fifth data table, it is determined whether to perform a second smoothing process on the second real data table, including:
[0188] S81, determine the distribution difference between each attribute value of the first attribute in the second real data table and each attribute value of the first attribute in the fifth data table;
[0189] S82, if the distribution difference is less than or equal to a preset threshold, determine to perform the second smoothing process on the second real data table;
[0190] S83, if the distribution difference is greater than the preset threshold, determine that the second smoothing process will not be performed on the second real data table.
[0191] Optionally, in this embodiment, the distribution difference may include, but is not limited to, the difference between the number of occurrences of each attribute value of the first attribute in the second real data table and the number of occurrences of each attribute value of the first attribute in the fifth data table. The chi-square value may be determined by, but is not limited to, the difference between the number of occurrences of each attribute value of the first attribute in the second real data table and the number of occurrences of each attribute value of the first attribute in the fifth data table.
[0192] like Figure 19 As shown, the chi-square value between each attribute value of the first attribute in the second real data table and each attribute value of the first attribute in the fifth data table can be determined, but is not limited to, by the following methods:
[0193] Step S1902: Determine the chi-square values of each attribute value of the first attribute in the second real data table and each attribute value of the first attribute in the fifth data table.
[0194] In detail, it can be determined, but is not limited to, by the following formula:
[0195] χ2=∑((O i -E i )2 / E i )
[0196] Where χ² is the chi-square value of the i-th value of the first attribute, O i E is the number of times the i-th value of the first attribute appears in the fifth data table. i This represents the frequency of the i-th value of the first attribute in the second real data table. Σ represents the summation over all cells. The degrees of freedom are calculated as the difference between the number of frequency categories and 1. A significance level is selected (e.g., 0.05 or 0.001, etc., this application does not impose any restrictions on this). The number of frequency categories can be, but is not limited to, the number of different values of the first attribute in the fifth data table and the second real data table. The chi-square distribution table is consulted based on the degrees of freedom and significance level to determine the critical value of the chi-square value.
[0197] Step S1904: Determine whether the chi-square value is less than or equal to a preset threshold. If the chi-square value is less than or equal to the preset threshold, proceed to step S1808. If the chi-square value is greater than the preset threshold, proceed to step S1806.
[0198] Step S1906: Determine that the second smoothing process will not be performed on the second real data table. Specifically, in this case, it can be shown that the number of times the attribute values appear in the fifth data table and the second real data table are not similar, so there is no need to perform the second smoothing process on the second real data table.
[0199] Step S1908: Determine to perform a second smoothing process on the second real data table. Specifically, in this case, it can be shown that the number of times the attribute values appear in the fifth data table and the second real data table are similar, so a second smoothing process needs to be performed on the second real data table.
[0200] In this way, the second real data table will only be smoothed if the distribution difference is less than or equal to a preset threshold. If the distribution difference is greater than the preset threshold, there is no need to smooth the second real data table. This saves the computer resources required to smooth the second real data table and improves the utilization rate of computer resources.
[0201] As an optional approach, sending the first set of data sub-table query requests and the second set of data sub-table query requests to the server includes:
[0202] S91, repeat the following steps until both the first group of data sub-table query requests and the second group of data sub-table query requests have been sent to the server: randomly select one or more data sub-table query requests from the data sub-table query requests in the first group of data sub-table query requests and the second group of data sub-table query requests that have not yet been sent to the server, and send the randomly selected one or more data sub-table query requests to the server.
[0203] To obfuscate the sending order of the various sub-table query requests in the first group of data sub-table query requests, such as... Figure 20 As shown, the first group of data sub-table query requests may include, but is not limited to, data sub-table query request 1 and data sub-table query request 2; the second group of data sub-table query requests that have not yet been sent to the server may include, but is not limited to, data sub-table query request 3 and data sub-table query request 4. In this case, data sub-table query request 1 may be randomly selected from the first group of data sub-table query requests, and data sub-table query request 3 and data sub-table query request 4 may be randomly selected from the second group of data sub-table query requests that have not yet been sent to the server and sent to server 104. In this way, the sending order of the various data sub-table query requests in the first group of data sub-table query requests is obfuscated, thereby improving the security of sending data query requests to the server.
[0204] As an optional approach, the above method also includes:
[0205] S101, the first group of data sub-tables is a data sub-table obtained by splitting the first data table according to the value range of the attribute value of the first attribute, wherein the value range of the attribute value of the first attribute in each data sub-table of the first group of data sub-tables is different.
[0206] S102, the second set of data sub-tables is a data sub-table obtained by splitting the second data table according to the value range of the attribute value of the second attribute, wherein the value range of the attribute value of the second attribute in each data sub-table of the second set of data sub-tables is different.
[0207] The differences between different attribute values in a data table may be significant, and the frequency of occurrence of different attribute values may vary. In such cases, attribute values with similar ranges can be split into the same sub-table, for example, using methods such as k-means clustering or hierarchical clustering. Then, the resulting sub-tables can be smoothed.
[0208] like Figure 21 As shown, the data table records attribute values, which may include, but are not limited to, 10, 10, 10, 10, 2, 3, 4, 5, and 6. In related technologies, the data table is not split; instead, the unsplit data table is directly smoothed. In this case, new data needs to be introduced into the data table (e.g., 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6), which can lead to O(n^2) errors in extreme cases. ^2 This method addresses the spatial expansion and latency issues associated with data frequency smoothing. In this embodiment, the data table is first divided into multiple sub-tables with similar attribute value ranges, such as sub-table 1 and sub-table 2. Sub-table 1 has an attribute value of 10, while sub-table 2 has attribute values ranging from 2 to 6. Attribute values in sub-table 2 can include, but are not limited to, 2, 3, 4, 4, 5, and 6. Since the frequency of each attribute value in sub-table 1 and sub-table 2 is the same, smoothing is unnecessary for sub-tables 1 and 2. Compared to related technologies that require introducing new data to smooth attribute values, this method significantly reduces the spatial and temporal overhead of data frequency smoothing.
[0209] As an optional approach, the above method also includes:
[0210] S111, the data table query request is used to request to query the attribute value of the second attribute that has a first matching relationship with the attribute value of the first attribute in the second data table, wherein the first matching relationship means that the hash value of the attribute value of the first attribute is the same as the hash value of the attribute value of the second attribute, or that the hash value of the attribute value of the first attribute is the same as the hash value of a value in a first target value range, wherein the first target value range is the value range represented by an attribute value of an attribute corresponding to the second attribute in the second data table;
[0211] S112, the first group of data sub-table query request is used to request query the attribute value of the second attribute in the second group of data sub-table that has a second matching relationship with the attribute value of the first attribute in the first group of data sub-table, wherein the second matching relationship means that the hash value of the attribute value of the first attribute is the same as the hash value of the attribute value of the second attribute, or that the hash value of the attribute value of the first attribute is the same as the hash value of a value in the second target value range, and the second target value range is the value range represented by an attribute value of an attribute corresponding to the second attribute in the second group of data sub-table.
[0212] Optionally, in this embodiment, the above method further includes: a second set of data sub-table query request is used to request querying the attribute value of the second attribute in the second set of data sub-table that has a third matching relationship with the attribute value of the third attribute in the third set of data sub-table, wherein the third matching relationship means that the hash value of the attribute value of the third attribute is the same as the hash value of the attribute value of the second attribute, or that the hash value of the attribute value of the third attribute is the same as the hash value of a value in the second target value range, and the second target value range is the value range represented by an attribute value of an attribute corresponding to the second attribute in the second set of data sub-table.
[0213] Alternatively, a second set of data sub-table query request is used to request the query in the fourth set of data sub-table for the attribute value of the fourth attribute that has a fourth matching relationship with the attribute value of the first attribute in the first set of data sub-table. Here, the fourth matching relationship means that the hash value of the attribute value of the fourth attribute is the same as the hash value of the attribute value of the first attribute, or that the hash value of the attribute value of the first attribute is the same as the hash value of a value in the third target value range. The third target value range is the value range represented by an attribute value of an attribute corresponding to the fourth attribute in the fourth set of data sub-table.
[0214] Alternatively, a second set of data sub-table query request is used to request the query in the fourth set of data sub-table for the attribute value of the fourth attribute that has a fifth matching relationship with the attribute value of the third attribute in the third set of data sub-table. Here, the fourth matching relationship means that the hash value of the attribute value of the fourth attribute is the same as the hash value of the attribute value of the third attribute, or that the hash value of the attribute value of the third attribute is the same as the hash value of a value in the third target value range. The third target value range is the value range represented by an attribute value of an attribute corresponding to the fourth attribute in the fourth set of data sub-table.
[0215] Optionally, in this embodiment, the attribute value of the second attribute that has a first matching relationship with the attribute value of the first attribute can be queried in the second data table in the following ways, but not limited to: obtaining a first set of hash values of the attribute value of the first attribute; obtaining a second set of hash values of the attribute value of the first attribute; searching for hash values that are the same as the first hash value from the second set of hash values one by one, wherein the first hash value is a hash value in the first set of hash values; if a second hash value that is the same as the first hash value is found from the second set of hash values, it is determined that the attribute value of the first attribute corresponding to the first hash value and the attribute value of the second attribute corresponding to the second hash value satisfy the first matching relationship.
[0216] like Figure 22 As shown, it is possible, but not limited to, to determine that the attribute value corresponding to the hash value H(3) of the attribute (e.g., score) in the data sub-table stu_1' matches the attribute value H(3) of the attribute (e.g., grade) in the data sub-table child_1, provided that the hash value H(3) of the attribute (e.g., score) in the data sub-table stu_1' matches the attribute value (e.g., grade) in the data sub-table child_1. For example, a score of 90 corresponds to a grade of "excellent". Hash-join is used to connect data with different attributes in multiple data tables. The connection result can be obtained by simply comparing whether the hash values of the attribute values of different attributes in different data tables are the same, which improves the efficiency of data query.
[0217] To better understand the data table query method in the embodiments of this application, the data table query method in the embodiments of this application will be explained and described below in conjunction with optional embodiments, which may be applied to the embodiments of this application but are not limited to them.
[0218] like Figure 23 As shown, when the client receives a data table query request (e.g., Select * from stu joinChild from stu.score = child.grade), it indicates a desire to query the grade corresponding to the score in the data table stu within the data table child. In this case, the data table query in this embodiment can be implemented in the following ways, but is not limited to:
[0219] When a client sends a data table query request (e.g., `Select * from stu join Child omstu.score = child.grade`), the system first matches the original table names (e.g., `stu` and `child`) carried in the database table join request against the sub-table mapping table stored by the client. Then, it replaces the original table names (e.g., `stu` and `child`) and attribute names (e.g., `score` and `grade`) of the data table query request with the corresponding encrypted table names of the sub-tables (e.g., `E(stu_1'), `E(stu_2)` and `E(child_1), `E(child_2)`) and attribute names (e.g., `E(score_1)`, `E(score_2)`, and `E(grade)`). For each child (e_1) and child (e(grade_2)), multiple new sub-table query requests are generated (equivalent to the first set of sub-table query requests). Simultaneously, a second set of sub-table query requests is generated for the first set. This second set of sub-table query requests can, but is not limited to, querying the hash values of the grades of E(child_1) and E(child_2) to find hash values that match the hash values of the weights of E(weight_1) and E(weight_2). To obfuscate the query sequence, sub-table query requests can, but are not limited to, be randomly selected from the first and second sets of sub-table query requests and sent to the server.
[0220] When the server receives a query request for a sub-table, it determines the corresponding AES encrypted table and Hash encrypted table based on the encrypted table name in the query request. Then, based on the attribute name in the query, it selects the corresponding Hash encrypted table for a table join operation, performing a Hash-join through the Value attribute to obtain the first set of query results and the second set of query results, which are then returned to the client.
[0221] For example, if the encrypted table names in the sub-table query request obtained by the server include E(stu_1') and E(child_1), the corresponding AES encrypted table and Hash encrypted table can be determined based on, but are not limited to, E(stu_1') and E(child_1). Then, based on the attribute names in the query (e.g., E(score_1) and E(grade_1)), the corresponding Hash encrypted table is selected for table join operation. The query result is obtained by performing a Hash-join through the Value attribute.
[0222] In detail, the hash ciphertext table of data sub-table E(child_1) stores the hash data corresponding to encryption level E(good) and encryption level E(excellent). The hash data corresponding to encryption level E(good) may include, but is not limited to, the hash value H(1) corresponding to encryption score E(80), the level E(good) corresponding to hash value H(1), the table E(child_1) to which it belongs, and the id of level E(good) in data table E(child_1) is 1. The id of 1 indicates that level E(good) is located in the first or first row of the attribute column in data table E(child_1), and f_id is -1, which is used to indicate that there is no data in data table E(child_1) with the next level of level E(good).
[0223] The hash data corresponding to encryption level E (excellent) may include, but is not limited to, the hash value H (2) corresponding to encryption score E (90), the level E (excellent) corresponding to hash value H (2), the table E (child_1) to which it belongs, and the id of level E (excellent) in data table E (child_1) is 2. The id of 2 is used to indicate that level E (good) is located in the second or second row of the attribute column in data table E (child_1), and f_id is -1, which is used to indicate that there is no data in data table E (child_1) with the next level of level E (excellent).
[0224] For example: Match the hash values of each score in sub-table E(stu_1') with the hash values corresponding to the score ranges in sub-tables E(child_1) and E(child_2). If there is a hash value in the score range that is the same as the score's hash value, then the grade corresponding to the hash value in the score range that is the same as the score's hash value is determined as the grade corresponding to the score. For example, the grade corresponding to a score of 80 is "Good" and the grade corresponding to a score of 90 is "Excellent".
[0225] Upon receiving the query results from the server, the client may, but is not limited to, delete the query results of the second set of data sub-tables query request based on the request identifier carried in the query results, while retaining the first set of query results of the first set of data sub-tables query request. Next, the results will be combined, and data marked E(0) in the query results of the first set of data sub-tables query request will be deleted to obtain all query results of the first set of data query requests, which will then be returned to the user.
[0226] The data table query method in this application embodiment greatly reduces the data table query latency, such as... Figure 24As shown, query times in related technologies are often on the order of kiloseconds, for example, around 2800 to 2900 seconds. Compared to the query times in related technologies, the query method for the data table in this application embodiment significantly reduces latency, often by only a few percent of the query times in related technologies. For example, the query time in this application embodiment is often around 10 to 21 seconds, representing a breakthrough performance advantage and greatly improving the query efficiency of the data table. This makes the application of table joins for encrypted data in industrial scenarios possible.
[0227] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to this application.
[0228] According to another aspect of the embodiments of this application, a query apparatus for implementing the above-described data table is also provided. For example... Figure 25 As shown, the device includes:
[0229] The acquisition unit 2502 is used to acquire a data table query request, wherein the data table query request is used to request to query the attribute value of the second attribute that matches the attribute value of the first attribute in the second data table on the server, and the first attribute is an attribute included in the first data table on the server;
[0230] The conversion unit 2506 is used to convert the data table query request into a first group of data sub-table query request, wherein the first group of data sub-table query request is used to request to query the attribute value of the second attribute that matches the attribute value of the first attribute in the second group of data sub-tables. The first group of data sub-tables is a data sub-table obtained by splitting the first data table according to the attribute value of the first attribute, and the second group of data sub-tables is a data sub-table obtained by splitting the second data table according to the attribute value of the second attribute.
[0231] The generation unit 2506 is used to generate a second set of data sub-table query requests, wherein the second set of data sub-table query requests is different from the first set of data sub-table query requests;
[0232] The sending unit 2508 is used to send the first group of data sub-table query requests and the second group of data sub-table query requests to the server, and to obtain the first group of query results corresponding to the first group of data sub-table query requests and the second group of query results corresponding to the second group of data sub-table query requests sent by the server.
[0233] The embodiments provided in this application convert the data table query request obtained from the client into a first set of data sub-table query requests and generate a second set of data sub-table query requests. On the one hand, randomly sending the first set of data sub-table query requests and the second set of data sub-table query requests to the server can interfere with the sending order of the query requests in the first set of data sub-table query requests. On the other hand, the query results of the second set of data sub-table query requests can interfere with the frequency statistics of the query results of the first set of data sub-table query requests, thereby achieving the technical effect of improving the security of data table queries and solving the technical problem of low security of data table queries.
[0234] Optionally, the generation unit includes: a first generation module, configured to generate a second set of data sub-table query request, wherein the second set of data sub-table query request is configured to request querying in the second set of data sub-tables an attribute value of the second attribute that matches the attribute value of the third attribute in the third set of data sub-tables, wherein the third set of data sub-tables is a data sub-table obtained by splitting the third data table on the server according to the attribute value of the third attribute; or a second generation module, configured to generate a second set of data sub-table query request, wherein the second set of data sub-table query request is configured to request querying in the fourth set of data sub-tables an attribute value of the fourth attribute that matches the attribute value of the first attribute in the first set of data sub-tables, wherein the fourth set of data sub-tables is a data sub-table obtained by splitting the fourth data table on the server according to the attribute value of the fourth attribute; or a third generation module, configured to generate a second set of data sub-table query request, wherein the second set of data sub-table query request is configured to request querying in the fourth set of data sub-tables an attribute value of the fourth attribute that matches the attribute value of the third attribute in the third set of data sub-tables.
[0235] Optionally, the apparatus further includes: a first search unit, configured to search on the client for the identifier of a data table that satisfies a first matching condition with the first data table, to obtain the identifier of the third data table, and to obtain on the client the identifier of the third group of data sub-tables that have a mapping relationship with the third data table, wherein the first matching condition includes a third value range that is at least partially the same as the first value range, the first value range being the value range of the attribute value of the first attribute in the first data table, and the third value range being the value range of the attribute value of the third attribute in the third data table; or a second search unit, configured to search on the client for the identifier of a data sub-table that satisfies a second matching condition with a data sub-table in the first group of data sub-tables, to obtain the identifier of the third group of data sub-tables, wherein the second matching condition includes a third value sub-range that is at least partially the same as the first value sub-range, the first value sub-range being the value range of the attribute value of the first attribute in the data sub-table in the first group of data sub-tables, and the third value sub-range being the value range of the attribute value of the third attribute in the data sub-table in the third group of data sub-tables.
[0236] Optionally, the first matching condition further includes: the first number and the third number are different, wherein the first number is the number of times the first value appears in the first attribute of the first data table, and the third number is the number of times the first value appears in the third attribute of the third data table, and the first value is a value in the third value range that is the same as the first value range; or, the first ratio and the third ratio are different, wherein the first ratio is the ratio obtained by dividing the first number by the first total number, the first total number is the sum of the number of times each value in the first value range appears in the first attribute of the first data table, and the third ratio is the ratio obtained by dividing the third number by the third total number, and the third total number is the sum of the number of times each value in the third value range appears in the third attribute of the third data table; or the second matching condition further includes: the first sub-number and the third sub-number. The three sub-quantities are different, wherein the first sub-quantity is the number of times the third value appears in the first attribute of one data sub-table in the first group of data sub-tables, and the third sub-quantity is the number of times the third value appears in the third attribute of another data sub-table in the third group of data sub-tables, and the third value is the same value in the third value sub-range as the first value sub-range; or, the first sub-ratio and the third sub-ratio are different, wherein the first sub-ratio is the ratio obtained by dividing the first sub-quantity by the total quantity of the first sub-table, and the total quantity of the first sub-table is the sum of the number of times each value in the first value sub-range appears in the first attribute of one data sub-table, and the third sub-ratio is the ratio obtained by dividing the third sub-quantity by the total quantity of the third sub-table, and the total quantity of the third sub-table is the sum of the number of times each value in the third value range appears in the third attribute of another data sub-table.
[0237] Optionally, the apparatus further includes: a third search unit, configured to search on the client for the identifier of a data table that satisfies a third matching condition with the second data table, obtain the identifier of the fourth data table, and obtain on the client the identifier of the fourth group of data sub-tables that has a mapping relationship with the fourth data table, wherein the third matching condition includes a fourth value range that is at least partially the same as a second value range, the second value range being the value range of the attribute value of the second attribute in the second data table, and the fourth value range being the value range of the attribute value of the fourth attribute in the fourth data table; or a fourth search unit, configured to search on the client for the identifier of a data sub-table that satisfies a fourth matching condition with a data sub-table in the second group of data sub-tables, obtain the identifier of the fourth group of data sub-tables, wherein the fourth matching condition includes a fourth value sub-range that is at least partially the same as a second value sub-range, the second value sub-range including the value range of the attribute value of the second attribute in the data sub-table in the second group of data sub-tables, and the fourth value sub-range including the value range of the attribute value of the fourth attribute in the data sub-table in the fourth group of data sub-tables.
[0238] Optionally, the third matching condition further includes: the second number and the fourth number are different, wherein the second number is the number of times the second value appears in the second attribute of the second data table, the fourth number is the number of times the second value appears in the fourth attribute of the fourth data table, and the second value is a value that is the same as the value in the fourth value range and the second value range; or, the second ratio and the fourth ratio are different, wherein the second ratio is the ratio obtained by dividing the second number by the second total number, the second total number is the sum of the number of times each value in the second value range appears in the second attribute of the second data table, and the fourth ratio is the ratio obtained by dividing the fourth number by the fourth total number, and the fourth total number is the sum of the number of times each value in the fourth value range appears in the fourth attribute of the fourth data table; or the fourth matching condition further includes: the second sub-number and the fourth number. The four sub-quantities are different, wherein the second sub-quantity is the number of times the fourth value appears in the second attribute of one data sub-table in the second group of data sub-tables, and the fourth sub-quantity is the number of times the second value appears in the fourth attribute of another data sub-table in the fourth group of data sub-tables, and the fourth value is the same value in the fourth value sub-range as the second value sub-range; or, the second sub-proportion is different from the fourth sub-proportion, wherein the second sub-proportion is the proportion obtained by dividing the second sub-quantity by the total quantity of the second sub-table, and the total quantity of the second sub-table is the sum of the number of times each value in the second value sub-range appears in the second attribute of one data sub-table, and the fourth sub-proportion is the proportion obtained by dividing the fourth sub-quantity by the total quantity of the fourth sub-table, and the total quantity of the fourth sub-table is the sum of the number of times each value in the fourth value range appears in the fourth attribute of another data sub-table.
[0239] Optionally, when the second group of data sub-table query request is used to request querying the attribute value of the second attribute in the second group of data sub-table that matches the attribute value of the third attribute in the third group of data sub-table, the first group query result includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the second attribute found in the second group of data sub-table that matches the attribute value of the first attribute in the first group of data sub-table; the second group query result includes encrypted data of the attribute value of the third attribute in the third group of data sub-table, and encrypted data of the attribute value of the second attribute found in the second group of data sub-table that matches the attribute value of the third attribute in the third group of data sub-table; or when the second group of data sub-table query request is used to request querying the attribute value of the fourth attribute in the fourth group of data sub-table that matches the attribute value of the first attribute in the first group of data sub-table, the first group query result includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the fourth attribute found in the second group of data sub-table that matches the attribute value of the third attribute in the third group of data sub-table. The second set of query results includes encrypted data of the attribute value of the second attribute that matches the attribute value of the first attribute in the first set of data sub-tables, and encrypted data of the attribute value of the fourth attribute that matches the attribute value of the first attribute in the first set of data sub-tables, as found in the fourth set of data sub-tables; or, when the second set of data sub-table query request is used to request the query of the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third set of data sub-tables in the fourth set of data sub-tables, the first set of query results includes encrypted data of the attribute value of the first attribute in the first set of data sub-tables, and encrypted data of the attribute value of the second attribute that matches the attribute value of the first attribute in the first set of data sub-tables, as found in the second set of data sub-tables, and the second set of query results includes encrypted data of the attribute value of the third attribute in the third set of data sub-tables, and encrypted data of the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third set of data sub-tables, as found in the fourth set of data sub-tables.
[0240] Optionally, the apparatus further includes: an updating unit, configured to update the first data table to a fifth data table based on a first difference between the first data table and the second data table when the first data table is a data table obtained by performing a first smoothing process on the attribute values of the first attribute in the first real data table, and the first real data table is updated to a second real data table; wherein the first smoothing process is used to ensure that each attribute value of the first attribute in the first real data table appears the same number of times in the first data table, and the second difference between the first data table and the fifth data table is the same as the first difference; and a determining unit, configured to determine based on the second real data table. The system includes a fifth data table and a system for determining whether to perform a second smoothing process on the second real data table. A smoothing processing unit is configured to perform the second smoothing process on the second real data table if it is determined that the second smoothing process should be performed, to obtain a sixth data table, and to replace the first data table on the server with the sixth data table. The second smoothing process is configured to ensure that the number of occurrences of each attribute value of the first attribute in the second real data table is the same in the sixth data table. A replacement unit is configured to replace the first data table on the server with the fifth data table if it is determined that the second smoothing process should not be performed on the second real data table.
[0241] Optionally, the determining unit includes: a first determining module, configured to determine the distribution difference between each attribute value of the first attribute in the second real data table and each attribute value of the first attribute in the fifth data table; a second determining module, configured to determine to perform the second smoothing process on the second real data table if the distribution difference is less than or equal to a preset threshold; and a third determining module, configured to determine not to perform the second smoothing process on the second real data table if the distribution difference is greater than the preset threshold.
[0242] Optionally, the sending unit repeats the following steps until both the first group of data sub-table query requests and the second group of data sub-table query requests have been sent to the server: randomly selecting one or more data sub-table query requests from the data sub-table query requests in the first group of data sub-table query requests and the second group of data sub-table query requests that have not yet been sent to the server, and sending the randomly selected one or more data sub-table query requests to the server.
[0243] Optionally, the first set of data sub-tables is a data sub-table obtained by splitting the first data table according to the value range of the attribute of the first attribute, wherein the value range of the attribute of the first attribute is different in each data sub-table of the first set of data sub-tables; the second set of data sub-tables is a data sub-table obtained by splitting the second data table according to the value range of the attribute of the second attribute, wherein the value range of the attribute of the second attribute is different in each data sub-table of the second set of data sub-tables.
[0244] Optionally, the data table query request is used to request a query in the second data table for the attribute value of a second attribute that has a first matching relationship with the attribute value of the first attribute. The first matching relationship means that the hash value of the attribute value of the first attribute is the same as the hash value of the attribute value of the second attribute, or that the hash value of the attribute value of the first attribute is the same as the hash value of a value within a first target value range. The first target value range is the value range represented by an attribute value of an attribute corresponding to the second attribute in the second data table. The first group of data sub-table query request is used to request a query in the second group of data sub-tables for the attribute value of the second attribute that has a second matching relationship with the attribute value of the first attribute in the first group of data sub-tables. The second matching relationship means that the hash value of the attribute value of the first attribute is the same as the hash value of the second attribute, or that the hash value of the attribute value of the first attribute is the same as the hash value of a value within a second target value range. The second target value range is the value range represented by an attribute value of an attribute corresponding to the second attribute in the second group of data sub-tables.
[0245] According to another aspect of the embodiments of this application, an electronic device for implementing the above-described data table query method is also provided. This electronic device may be... Figure 8 The terminal device or server shown. This embodiment uses the electronic device as a server as an example for illustration. Figure 26 As shown, the electronic device includes a memory 2602 and a processor 2604. The memory 2602 stores a computer program, and the processor 2604 is configured to execute the steps of any of the above method embodiments through the computer program.
[0246] Optionally, in this embodiment, the aforementioned electronic device may be located in at least one of a plurality of network devices in a computer network.
[0247] Optionally, in this embodiment, the processor can be configured to perform the following steps via a computer program:
[0248] S1, Obtain a data table query request, wherein the data table query request is used to request to query the attribute value of the second attribute that matches the attribute value of the first attribute in the second data table on the server, and the first attribute is an attribute included in the first data table on the server;
[0249] S2, convert the data table query request into a first group of data sub-table query requests, wherein the first group of data sub-table query requests are used to request to query the attribute value of the second attribute that matches the attribute value of the first attribute in the second group of data sub-tables. The first group of data sub-tables is a data sub-table obtained by splitting the first data table according to the attribute value of the first attribute, and the second group of data sub-tables is a data sub-table obtained by splitting the second data table according to the attribute value of the second attribute.
[0250] S3, Generate a second set of data sub-table query requests, wherein the second set of data sub-table query requests is different from the first set of data sub-table query requests;
[0251] S4, send the first group of data sub-table query request and the second group of data sub-table query request to the server, and obtain the first group of query results corresponding to the first group of data sub-table query request and the second group of query results corresponding to the second group of data sub-table query request sent by the server.
[0252] Alternatively, as those skilled in the art will understand, Figure 26 The structure shown is for illustrative purposes only. Electronic devices can also be smartphones (such as Android phones, iOS phones, etc.), tablets, PDAs, mobile internet devices (MIDs), PADs, and other terminal devices. Figure 26 This does not limit the structure of the aforementioned electronic devices or electronic equipment. For example, electronic devices or electronic equipment may also include components that are more... Figure 26 The more or fewer components shown (such as network interfaces, etc.), or having the same Figure 26 The different configurations shown.
[0253] The memory 2602 can be used to store software programs and modules, such as the program instructions / modules corresponding to the data table query method and apparatus in this embodiment. The processor 2604 executes various functional applications and data processing by running the software programs and modules stored in the memory 2602, thereby realizing the aforementioned data table query method. The memory 2602 may include high-speed random access memory and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 2602 may further include memory remotely located relative to the processor 2604, and these remote memories can be connected to the terminal via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof. Specifically, the memory 2602 may be used, but is not limited to, to store information such as sample characteristics of items and target virtual resource accounts. As an example, such as... Figure 26 As shown, the memory 2602 may include, but is not limited to, the acquisition unit 2502, conversion unit 2504, generation unit 2506, and sending unit 2508 in the data table query device. Furthermore, it may include, but is not limited to, other module units in the data table query device, which will not be described further in this example.
[0254] Optionally, the transmission device 2606 described above is used to receive or send data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 2606 includes a Network Interface Controller (NIC), which can be connected to other network devices and a router via a network cable to communicate with the Internet or a local area network. In another example, the transmission device 2606 is a radio frequency (RF) module, used for wireless communication with the Internet.
[0255] In addition, the aforementioned electronic device also includes: a display 2608 for displaying the order information to be processed; and a connection bus 2610 for connecting the various module components in the aforementioned electronic device.
[0256] In other embodiments, the aforementioned terminal device or server can be a node in a distributed system, wherein the distributed system can be a blockchain system, which is a distributed system formed by connecting multiple nodes through network communication. The nodes can form a peer-to-peer (P2P) network, and any form of computing device, such as a server, terminal, or other electronic device, can become a node in the blockchain system by joining this peer-to-peer network.
[0257] According to one aspect of this application, a computer program product is provided, comprising a computer program / instructions containing program code for performing the methods shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network via communication section 2709, and / or installed from removable medium 2711. When the computer program is executed by central processing unit 2701, it performs various functions provided in the embodiments of this application. The above embodiment numbers are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0258] Figure 27 A schematic block diagram of a computer system architecture for implementing an electronic device according to embodiments of this application is shown. It should be noted that... Figure 27 The computer system 2700 of the illustrated electronic device is merely an example and should not be construed as limiting the functionality and scope of the embodiments of this application. Figure 27 As shown, the computer system 2700 includes a central processing unit (CPU) 2701, which can perform various appropriate actions and processes based on programs stored in read-only memory (ROM) 2702 or programs loaded from storage section 2708 into random access memory (RAM). The RAM 2703 also stores various programs and data required for system operation. The CPU 2701, ROM 2702, and RAM 2703 are interconnected via a bus 2704. An input / output interface 2705 (I / O interface) is also connected to the bus 2704.
[0259] The following components are connected to the input / output interface 2705: an input section 2706 including a keyboard, mouse, etc.; an output section 2707 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 2708 including a hard disk, etc.; and a communication section 2709 including a network interface card such as a local area network card, modem, etc. The communication section 2709 performs communication processing via a network such as the Internet. A drive 2710 is also connected to the input / output interface 2705 as needed. Removable media 2711, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., are installed on the drive 2710 as needed so that computer programs read from them can be installed into the storage section 2708 as needed.
[0260] Specifically, according to embodiments of this application, the processes described in the various method flowcharts can be implemented as computer software programs. For example, embodiments of this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 2709, and / or installed from removable medium 2711. When the computer program is executed by central processing unit 2701, it performs various functions defined in the system of this application.
[0261] According to one aspect of this application, a computer-readable storage medium is provided, from which a processor of a computer device reads computer instructions, and the processor executes the computer instructions, causing the computer device to perform the methods provided in various optional implementations of the above embodiments.
[0262] Optionally, in this embodiment, the computer-readable storage medium may be configured to store a computer program for performing the following steps:
[0263] S1, Obtain a data table query request, wherein the data table query request is used to request to query the attribute value of the second attribute that matches the attribute value of the first attribute in the second data table on the server, and the first attribute is an attribute included in the first data table on the server;
[0264] S2, convert the data table query request into a first group of data sub-table query requests, wherein the first group of data sub-table query requests are used to request to query the attribute value of the second attribute that matches the attribute value of the first attribute in the second group of data sub-tables. The first group of data sub-tables is a data sub-table obtained by splitting the first data table according to the attribute value of the first attribute, and the second group of data sub-tables is a data sub-table obtained by splitting the second data table according to the attribute value of the second attribute.
[0265] S3, Generate a second set of data sub-table query requests, wherein the second set of data sub-table query requests is different from the first set of data sub-table query requests;
[0266] S4, send the first group of data sub-table query request and the second group of data sub-table query request to the server, and obtain the first group of query results corresponding to the first group of data sub-table query request and the second group of query results corresponding to the second group of data sub-table query request sent by the server.
[0267] Optionally, in this embodiment, those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing the hardware related to the terminal device. The program can be stored in a computer-readable storage medium, which may include: flash drive, read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.
[0268] If the integrated units in the above embodiments are implemented as software functional units and sold or used as independent products, they can be stored in the aforementioned computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause one or more computer devices (which may be personal computers, servers, or network devices, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application.
[0269] In the above embodiments of this application, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0270] In the several embodiments provided in this application, it should be understood that the disclosed client can be implemented in other ways. The device embodiments described above are merely illustrative; for example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, indirect coupling or communication connection between units or modules, and may be electrical or other forms.
[0271] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0272] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0273] The above description is only a preferred embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.
Claims
1. A method for querying a data table, characterized in that, include: Obtain a data table query request, wherein the data table query request is used to request to query the attribute value of the second attribute that matches the attribute value of the first attribute in the second data table on the server, and the first attribute is an attribute included in the first data table on the server; The data table query request is converted into a first set of data sub-table query requests. The first set of data sub-table query requests is used to request the query in the second set of data sub-tables to find the attribute value of the second attribute that matches the attribute value of the first attribute in the first set of data sub-tables. The first set of data sub-tables is a data sub-table obtained by splitting the first data table according to the attribute value of the first attribute. The second set of data sub-tables is a data sub-table obtained by splitting the second data table according to the attribute value of the second attribute. A second set of data sub-table query requests is generated, wherein the second set of data sub-table query requests differs from the first set of data sub-table query requests. The second set of data sub-table query requests is used to request the query in the second set of data sub-tables to find the attribute value of the second attribute that matches the attribute value of the third attribute in the third set of data sub-tables. The third set of data sub-tables is a data sub-table obtained by splitting the third data table on the server according to the attribute value of the third attribute. Alternatively, the second set of data sub-table query requests is used to request the query in the fourth set of data sub-tables to find the attribute value of the fourth attribute that matches the attribute value of the first attribute in the first set of data sub-tables. The fourth set of data sub-tables is a data sub-table obtained by splitting the fourth data table on the server according to the attribute value of the fourth attribute. Alternatively, the second set of data sub-table query requests is used to request the query in the fourth set of data sub-tables to find the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third set of data sub-tables. Send the first set of data sub-table query requests and the second set of data sub-table query requests to the server, and obtain the first set of query results corresponding to the first set of data sub-table query requests and the second set of query results corresponding to the second set of data sub-table query requests sent by the server.
2. The method according to claim 1, characterized in that, The method further includes: On the client, the identifier of the data table that satisfies the first matching condition with the first data table is found to obtain the identifier of the third data table. On the client, the identifiers of the third group of data sub-tables that have a mapping relationship with the third data table are obtained. The first matching condition includes a third value range that is at least partially the same as the first value range. The first value range is the value range of the attribute value of the first attribute in the first data table, and the third value range is the value range of the attribute value of the third attribute in the third data table. Alternatively... On the client, the identifier of the data sub-table that satisfies the second matching condition with the data sub-table in the first group of data sub-tables is found to obtain the identifier of the third group of data sub-tables. The second matching condition includes that the third value sub-range is at least partially the same as the first value sub-range. The first value sub-range is the value range of the attribute value of the first attribute in the data sub-table in the first group of data sub-tables, and the third value sub-range is the value range of the attribute value of the third attribute in the data sub-table in the third group of data sub-tables.
3. The method according to claim 2, characterized in that, The first matching condition further includes: the first number and the third number are different, wherein the first number is the number of times the first value appears in the first attribute of the first data table, the third number is the number of times the first value appears in the third attribute of the third data table, and the first value is a value in the third value range that is the same as the first value range; or, the first ratio and the third ratio are different, wherein the first ratio is the ratio obtained by dividing the first number by the first total number, the first total number is the sum of the number of times each value in the first value range appears in the first attribute of the first data table, and the third ratio is the ratio obtained by dividing the third number by the third total number, and the third total number is the sum of the number of times each value in the third value range appears in the third attribute of the third data table; or The second matching condition further includes: the first sub-count and the third sub-count are different, wherein the first sub-count is the number of times the third value appears in the first attribute of one data sub-table in the first group of data sub-tables, and the third sub-count is the number of times the third value appears in the third attribute of another data sub-table in the third group of data sub-tables, and the third value is the same value in the third value sub-range as the first value sub-range; or, the first sub-proportion and the third sub-proportion are different, wherein the first sub-proportion is the proportion obtained by dividing the first sub-count by the total count of the first sub-table, and the total count of the first sub-table is the sum of the counts of each value in the first value sub-range appearing in the first attribute of one data sub-table, and the third sub-proportion is the proportion obtained by dividing the third sub-count by the total count of the third sub-table, and the total count of the third sub-table is the sum of the counts of each value in the third value range appearing in the third attribute of another data sub-table.
4. The method according to claim 1, characterized in that, The method further includes: On the client, the identifier of the data table that satisfies the third matching condition with the second data table is found to obtain the identifier of the fourth data table. On the client, the identifiers of the fourth group of data sub-tables that have a mapping relationship with the fourth data table are obtained. The third matching condition includes a fourth value range that is at least partially the same as the second value range, where the second value range is the value range of the attribute value of the second attribute in the second data table, and the fourth value range is the value range of the attribute value of the fourth attribute in the fourth data table; or On the client, the identifier of the data sub-table that satisfies the fourth matching condition with the data sub-table in the second group of data sub-tables is obtained, wherein the fourth matching condition includes that the fourth value sub-range is at least partially the same as the second value sub-range, the second value sub-range includes the value range of the attribute value of the second attribute in the data sub-table in the second group of data sub-tables, and the fourth value sub-range includes the value range of the attribute value of the fourth attribute in the data sub-table in the fourth group of data sub-tables.
5. The method according to claim 4, characterized in that, The third matching condition further includes: the second number and the fourth number are different, wherein the second number is the number of times the second value appears in the second attribute of the second data table, the fourth number is the number of times the second value appears in the fourth attribute of the fourth data table, and the second value is a value that is the same as the value in the fourth value range and the second value range; or, the second ratio and the fourth ratio are different, wherein the second ratio is the ratio obtained by dividing the second number by the second total number, the second total number is the sum of the number of times each value in the second value range appears in the second attribute of the second data table, and the fourth ratio is the ratio obtained by dividing the fourth number by the fourth total number, and the fourth total number is the sum of the number of times each value in the fourth value range appears in the fourth attribute of the fourth data table; or The fourth matching condition further includes: the second sub-number and the fourth sub-number are different, wherein the second sub-number is the number of times the fourth value appears in the second attribute in one data sub-table of the second group of data sub-tables, and the fourth sub-number is the number of times the second value appears in the fourth attribute in another data sub-table of the fourth group of data sub-tables, and the fourth value is the same value in the fourth value sub-range as the second value sub-range; or, the second sub-proportion and the fourth sub-proportion are different, wherein the second sub-proportion is the proportion obtained by dividing the second sub-number by the total number of times in the second sub-table, and the total number of times in the second sub-table is the sum of the number of times each value in the second value sub-range appears in the second attribute in one data sub-table, and the fourth sub-proportion is the proportion obtained by dividing the fourth sub-number by the total number of times in the fourth sub-table, and the total number of times in the fourth sub-table is the sum of the number of times each value in the fourth value range appears in the fourth attribute in another data sub-table.
6. The method according to claim 1, characterized in that, When the second group of data sub-table query request is used to request the query in the second group of data sub-table for the attribute value of the second attribute that matches the attribute value of the third attribute in the third group of data sub-table, the first group query result includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the second attribute that matches the attribute value of the first attribute in the first group of data sub-table found in the second group of data sub-table. The second group query result includes encrypted data of the attribute value of the third attribute in the third group of data sub-table, and encrypted data of the attribute value of the second attribute that matches the attribute value of the third attribute in the third group of data sub-table found in the second group of data sub-table. or When the second group of data sub-table query request is used to request the query in the fourth group of data sub-table for the attribute value of the fourth attribute that matches the attribute value of the first attribute in the first group of data sub-table, the first group query result includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the second attribute that matches the attribute value of the first attribute in the first group of data sub-table found in the second group of data sub-table. The second group query result includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the fourth attribute that matches the attribute value of the first attribute in the first group of data sub-table found in the fourth group of data sub-table. or When the second group of data sub-table query request is used to request the query in the fourth group of data sub-table for the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third group of data sub-table, the first group of query results includes encrypted data of the attribute value of the first attribute in the first group of data sub-table, and encrypted data of the attribute value of the second attribute that matches the attribute value of the first attribute in the first group of data sub-table, which is found in the second group of data sub-table. The second group of query results includes encrypted data of the attribute value of the third attribute in the third group of data sub-table, and encrypted data of the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third group of data sub-table, which is found in the fourth group of data sub-table.
7. The method according to claim 1, characterized in that, The method further includes: In the case where the first data table is a data table obtained by performing a first smoothing process on the attribute values of the first attribute in the first real data table, and the first real data table is updated to the second real data table, the first data table is updated to the fifth data table based on the first difference between the first real data table and the second real data table. The first smoothing process is used to make each attribute value of the first attribute in the first real data table appear the same number of times in the first data table, and the second difference between the first data table and the fifth data table is the same as the first difference. Based on the second real data table and the fifth data table, determine whether to perform a second smoothing process on the second real data table; If it is determined that the second smoothing process is to be performed on the second real data table, the second smoothing process is performed on the second real data table to obtain a sixth data table, and the first data table on the server is replaced with the sixth data table, wherein the second smoothing process is used to make the number of times each attribute value of the first attribute in the second real data table appears in the sixth data table the same. If it is determined that the second smoothing process will not be performed on the second real data table, the first data table on the server will be replaced with the fifth data table.
8. The method according to claim 7, characterized in that, The step of determining whether to perform a second smoothing process on the second real data table based on the second real data table and the fifth data table includes: Determine the distribution differences between the attribute values of the first attribute in the second real data table and the attribute values of the first attribute in the fifth data table; If the distribution difference is less than or equal to a preset threshold, the second smoothing process is performed on the second real data table. If the distribution difference is greater than the preset threshold, it is determined that the second smoothing process will not be performed on the second real data table.
9. The method according to any one of claims 1 to 8, characterized in that, Sending the first set of data sub-table query requests and the second set of data sub-table query requests to the server includes: Repeat the following steps until both the first group of data sub-table query requests and the second group of data sub-table query requests have been sent to the server: Randomly select one or more data sub-table query requests from the data sub-table query requests in the first group and the second group that have not yet been sent to the server, and send the randomly selected one or more data sub-table query requests to the server.
10. The method according to any one of claims 1 to 8, characterized in that, The first set of data sub-tables is obtained by splitting the first data table according to the value range of the attribute value of the first attribute. The value range of the attribute value of the first attribute is different in each data sub-table of the first set of data sub-tables. The second set of data sub-tables is obtained by splitting the second data table according to the value range of the attribute value of the second attribute. The value range of the attribute value of the second attribute is different in each data sub-table of the second set of data sub-tables.
11. The method according to any one of claims 1 to 8, characterized in that, The data table query request is used to request a query in the second data table for the attribute value of the second attribute that has a first matching relationship with the attribute value of the first attribute. The first matching relationship means that the hash value of the attribute value of the first attribute is the same as the hash value of the attribute value of the second attribute, or that the hash value of the attribute value of the first attribute is the same as the hash value of a value in a first target value range. The first target value range is the value range represented by an attribute value of an attribute corresponding to the second attribute in the second data table. The first set of data sub-table query request is used to request a query in the second set of data sub-table for the attribute value of the second attribute that has a second matching relationship with the attribute value of the first attribute in the first set of data sub-table. The second matching relationship means that the hash value of the attribute value of the first attribute is the same as the hash value of the attribute value of the second attribute, or that the hash value of the attribute value of the first attribute is the same as the hash value of a value in the second target value range. The second target value range is the value range represented by an attribute value of an attribute corresponding to the second attribute in the second set of data sub-table.
12. A data table query device, characterized in that, include: The acquisition unit is used to acquire a data table query request, wherein the data table query request is used to request to query the attribute value of a second attribute that matches the attribute value of a first attribute in a second data table on the server, and the first attribute is an attribute included in the first data table on the server; A conversion unit is used to convert the data table query request into a first set of data sub-table query requests, wherein the first set of data sub-table query requests are used to request to query the attribute value of the second attribute that matches the attribute value of the first attribute in the second set of data sub-tables. The first set of data sub-tables is a data sub-table obtained by splitting the first data table according to the attribute value of the first attribute, and the second set of data sub-tables is a data sub-table obtained by splitting the second data table according to the attribute value of the second attribute. A generation unit is configured to generate a second set of data sub-table query requests, wherein the second set of data sub-table query requests differs from the first set of data sub-table query requests. The second set of data sub-table query requests requests to query in the second set of data sub-tables the attribute value of the second attribute that matches the attribute value of the third attribute in the third set of data sub-tables. The third set of data sub-tables is a data sub-table obtained by splitting the third data table on the server according to the attribute value of the third attribute. Alternatively, the second set of data sub-table query requests requests to query in the fourth set of data sub-tables the attribute value of the fourth attribute that matches the attribute value of the first attribute in the first set of data sub-tables. The fourth set of data sub-tables is a data sub-table obtained by splitting the fourth data table on the server according to the attribute value of the fourth attribute. Alternatively, the second set of data sub-table query requests requests to query in the fourth set of data sub-tables the attribute value of the fourth attribute that matches the attribute value of the third attribute in the third set of data sub-tables. The sending unit is configured to send the first set of data sub-table query requests and the second set of data sub-table query requests to the server, and to obtain the first set of query results corresponding to the first set of data sub-table query requests and the second set of query results corresponding to the second set of data sub-table query requests sent by the server.
13. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes a stored program, wherein the program, when executed, performs the method described in any one of claims 1 to 11.
14. A computer program product comprising a computer program or instructions, characterized in that, When the computer program or instructions are executed by a processor, they implement the steps of the method described in any one of claims 1 to 11.
15. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program, and the processor is configured to execute the method described in any one of claims 1 to 11 through the computer program.