A keyword private information retrieval method and device

By receiving the number M of user-input keywords and obfuscated words, the total data X is grouped to generate target filtering information, which is then sent to the server to obtain the target index list. This solves the problems of high computational overhead of electronic devices and easy identification of fake keywords, achieving higher privacy and security.

CN116028948BActive Publication Date: 2026-06-12NSFOCUS INFORMATION TECHNOLOGY CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NSFOCUS INFORMATION TECHNOLOGY CO LTD
Filing Date
2022-12-21
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing keyword privacy information retrieval methods suffer from problems such as high computational overhead of electronic devices and the ease with which fake keywords can be identified, leading to the leakage of target keywords.

Method used

By receiving the number M of keywords and obfuscated words input by the user, and the total amount of grouped data X, target filtering information is generated and sent to the server to obtain a target index list. Hash operations and encryption algorithms are used to protect privacy information, reduce the computing overhead of electronic devices, and improve security.

🎯Benefits of technology

It reduces the computational overhead of electronic devices, prevents excessive information leakage between both parties during the data screening stage, and improves the privacy and security of data transmission.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116028948B_ABST
    Figure CN116028948B_ABST
Patent Text Reader

Abstract

The application discloses a keyword private information retrieval method and device, which is used for reducing the computation and communication overhead of an electronic device and protecting the user retrieval condition and retrieval result. The method comprises the following steps: receiving a user-inputted keyword to be searched for searching for private information and a first number M of obfuscated words used for obfuscating the keyword to be searched for, receiving a total data amount X sent by a server; grouping the total data amount X according to the first number to obtain a second number N; obtaining target screening information according to the second number N and the keyword to be searched for; sending the second number N and the target screening information to the server, receiving a target index list fed back by the server, wherein the target index list comprises Y ciphertext indexes obtained by substituting the second number N and the target screening information into a preset screening condition; obtaining a sub-data set containing private information corresponding to the target index list from the server according to the target index list, and obtaining the private information corresponding to the keyword to be searched for from the sub-data set.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a method and apparatus for retrieving keyword privacy information. Background Technology

[0002] Currently, data security and privacy protection are two key requirements that cannot be ignored in business operations in the era of big data. In data query business, user B has privacy protection needs, that is, they do not want security company A to know their query conditions and query results during the query process. On the other hand, information security company A also has data security needs, that is, they do not want user B to know any data other than the query results. Therefore, how to meet the needs of information security company A and user B has become an urgent problem to be solved.

[0003] In related technologies, to meet the needs of information security company A and user B, the following solutions are generally adopted:

[0004] Option 1: The electronic device sends a data retrieval request to the server. The server then sends all the data in the encrypted database to the electronic device. The electronic device decrypts all the data to determine the private data it actually wants to obtain.

[0005] However, when using Scheme 1, the more data the server stores, the greater the computational load on the electronic device itself, resulting in a high computational overhead for the electronic device.

[0006] Option 2: The electronic device randomly generates a preset number of fake keywords and sends these fake keywords along with the target search keywords to the server. The server searches for the fake keywords and the target keywords, encrypts the search results, and sends them to the electronic device. The electronic device decrypts the obtained information to obtain the target search keyword information.

[0007] However, when using Scheme 2, the fake keywords randomly generated by the electronic device may be identified by the server, thereby exposing the real query needs of the electronic device and reducing the user experience.

[0008] In summary, current keyword privacy information retrieval methods suffer from technical problems such as high computational overhead of electronic devices and easy identification of fake keywords leading to leakage of target keywords. Summary of the Invention

[0009] This invention provides a keyword privacy information retrieval method and apparatus to reduce the computational and communication overhead of electronic devices and protect user search conditions and search results.

[0010] Firstly, a keyword privacy information retrieval method is provided, the method comprising:

[0011] The system receives user-input keywords for retrieving privacy information and a first number M of obfuscating words for obfuscating the keywords, as well as the total amount of data X sent by the server.

[0012] Based on the first quantity M, the total data X is grouped to obtain a second quantity N; and based on the second quantity N and the keyword to be searched, target filtering information is obtained; wherein, the target filtering information is used to filter data from the database of the server; the second quantity N is used to characterize the number of groups obtained after grouping the total data X;

[0013] The second quantity N and the target filtering information are sent to the server, and the target index list fed back by the server is received. The target index list includes Y encrypted indexes. The Y encrypted indexes are encrypted indexes corresponding to privacy information in the server's database. The Y encrypted indexes are obtained by substituting the second quantity N and the target filtering information into preset filtering conditions.

[0014] Based on the target index list, a subset containing privacy information corresponding to the target index list is obtained from the server, and the privacy information corresponding to the keyword to be searched is obtained from the subset.

[0015] In one possible implementation, target filtering information is obtained based on the second quantity N and the keyword to be searched, including:

[0016] Substitute the second quantity N and the keyword to be searched into the following formula:

[0017] result = H(keyword)%N

[0018] Wherein, result is used to represent the target filtering information, H(keyword) is used to represent the hash operation performed on the keyword to be searched and the hash result is rounded down, keyword is used to represent the keyword to be searched, and % is used to represent the modulo operation.

[0019] In one possible implementation, after obtaining the target filtering information, the method further includes:

[0020] Generate random numbers;

[0021] According to the preset encryption algorithm, the random number and the public key used by the server to encrypt data are used to encrypt the keyword to be searched, thereby obtaining the encrypted keyword to be searched. The encrypted keyword to be searched is then sent to the server so that the server can perform secondary encryption on the encrypted keyword to be searched, thereby obtaining the secondary encrypted keyword to be searched.

[0022] In one possible implementation, based on the target index list, a subset containing privacy information corresponding to the target index list is obtained from the server, and the privacy information corresponding to the keyword to be searched is obtained from the subset, including:

[0023] Based on the random number, the secondary encrypted keywords returned by the server are deblinded to obtain a target sub-index; the target sub-index has the same format as the ciphertext index in the database.

[0024] In the target index list, find the index that matches the target sub-index to obtain the matching index, and use the sequence number of the matching index in the target index list as the target index;

[0025] From the subset of data, find the subset of data corresponding to the target index, and use it as the privacy information corresponding to the keyword to be searched.

[0026] In one possible implementation, the total data X is grouped according to the first quantity M to obtain a second quantity N, including:

[0027] Substitute the total data X and the first quantity M into the following formula:

[0028] N = X / / M

[0029] Here, / / is used to represent integer division operations.

[0030] Secondly, a keyword privacy information retrieval device is provided, the device comprising:

[0031] The receiving unit is configured to receive the user-input keywords for retrieving privacy information and a first number M of obfuscating words for obfuscating the keywords, as well as the total amount of data X sent by the server;

[0032] The first processing unit is configured to group the total data X according to the first quantity M to obtain a second quantity N; and to obtain target filtering information according to the second quantity N and the keyword to be searched; wherein the target filtering information is used to filter data from the database of the server; the second quantity N is used to characterize the number of groups obtained after grouping the total data X;

[0033] The second processing unit is configured to send the second quantity N and the target filtering information to the server, and receive the target index list fed back by the server, wherein the target index list includes Y encrypted indexes; the Y encrypted indexes are encrypted indexes corresponding to privacy information in the server's database, and the Y encrypted indexes are obtained by substituting the second quantity N and the target filtering information into preset filtering conditions;

[0034] The obtaining unit is configured to obtain a subset containing privacy information corresponding to the target index list from the server based on the target index list, and obtain the privacy information corresponding to the keyword to be searched from the subset.

[0035] In one possible implementation, the first processing unit is specifically used for:

[0036] Substitute the second quantity N and the keyword to be searched into the following formula:

[0037] result = H(keyword)%N

[0038] Wherein, result is used to represent the target filtering information, H(keyword) is used to represent the hash operation performed on the keyword to be searched and the hash result is rounded down, keyword is used to represent the keyword to be searched, and % is used to represent the modulo operation.

[0039] In one possible implementation, the apparatus further includes a third processing unit for:

[0040] Generate random numbers;

[0041] According to the preset encryption algorithm, the random number and the public key used by the server to encrypt data are used to encrypt the keyword to be searched, thereby obtaining the encrypted keyword to be searched. The encrypted keyword to be searched is then sent to the server so that the server can perform secondary encryption on the encrypted keyword to be searched, thereby obtaining the secondary encrypted keyword to be searched.

[0042] In one possible implementation, the obtaining unit is configured to:

[0043] Based on the random number, the secondary encrypted keywords returned by the server are deblinded to obtain a target sub-index; the target sub-index has the same format as the ciphertext index in the database.

[0044] In the target index list, find the index that matches the target sub-index to obtain the matching index, and use the sequence number of the matching index in the target index list as the target index;

[0045] From the subset of data, find the subset of data corresponding to the target index, and use it as the privacy information corresponding to the keyword to be searched.

[0046] In one possible implementation, the first processing unit is specifically used for:

[0047] Substitute the total data X and the first quantity M into the following formula:

[0048] N = X / / M

[0049] Here, / / is used to represent integer division operations.

[0050] Thirdly, an electronic device is provided, the electronic device comprising:

[0051] Memory, used to store program instructions;

[0052] A processor is configured to invoke program instructions stored in the memory and execute the steps included in any of the methods in the first aspect according to the obtained program instructions.

[0053] Fourthly, a storage medium is provided that stores computer-executable instructions for causing an electronic device to perform the steps included in any of the methods in the first aspect.

[0054] Fifthly, a computer program product is provided that, when the computer program product is run on an electronic device, enables the electronic device to perform the steps included in any of the methods in the first aspect.

[0055] The technical solutions provided by the embodiments of the present invention bring at least the following beneficial effects:

[0056] In this embodiment of the invention, the electronic device can receive user-inputted keywords for retrieving privacy information and a first quantity M of obfuscating words for obfuscating the keywords, as well as a total data volume X sent by the server; group the total data volume X according to the first quantity M to obtain a second quantity N; and obtain target filtering information according to the second quantity N and the keywords; send the second quantity N and the target filtering information to the server, and receive a target index list fed back by the server, wherein the target index list includes Y encrypted indexes; the Y encrypted indexes are encrypted indexes corresponding to privacy information in the server's database, and the Y encrypted indexes are obtained by substituting the second quantity N and the target filtering information into preset filtering conditions. It can be seen that this embodiment of the invention only performs comparison processing on a single set of data, greatly reducing the computational overhead of the electronic device.

[0057] Furthermore, the electronic device can obtain a subset containing privacy information corresponding to the target index list from the server, and obtain the privacy information corresponding to the keyword to be searched from the subset. Thus, because this invention determines the target index list by sending a second quantity N and target filtering information to the server, it prevents excessive information leakage between both parties during the data filtering stage, thereby improving the privacy and security of the data transmission process.

[0058] Other features and advantages of the invention will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practice. The objects and other advantages of the invention may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings.

[0059] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit the invention. Attached Figure Description

[0060] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention, but do not constitute an undue limitation of the invention.

[0061] Figure 1 This is a schematic diagram of a keyword privacy information retrieval method in the prior art;

[0062] Figure 2 This is a schematic diagram of another keyword privacy information retrieval method in the prior art;

[0063] Figure 3 This is a schematic diagram of an application scenario in an embodiment of the present invention;

[0064] Figure 4 This is a flowchart of a keyword privacy information retrieval method according to an embodiment of the present invention;

[0065] Figure 5 This is a schematic diagram illustrating a data filtering method in an embodiment of the present invention;

[0066] Figure 6 This is a schematic diagram of another keyword privacy information retrieval method in an embodiment of the present invention;

[0067] Figure 7 This is a structural block diagram of the keyword privacy information retrieval device in an embodiment of the present invention;

[0068] Figure 8 This is a schematic diagram of the structure of an electronic device in an embodiment of the present invention. Detailed Implementation

[0069] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of the embodiments of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this invention, and not all of them. Based on the embodiments of this invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this invention. Unless otherwise specified, the embodiments and features in the embodiments of this invention can be arbitrarily combined with each other. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than that shown here.

[0070] The terms "first" and "second" in the specification, claims, and accompanying drawings of this invention are used for descriptive purposes only and should not be construed as indicating or implying relative importance or order. Furthermore, the term "comprising" and any variations thereof are intended to cover non-exclusive protection. For example, a process, method, system, product, or apparatus that comprises a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to such processes, methods, products, or apparatus.

[0071] To facilitate understanding of the technical solutions provided in the embodiments of the present invention, some key terms involved in the embodiments of the present invention will be explained here first:

[0072] Privacy information retrieval (PIR) is a strategy employed to protect the privacy of individuals on public online platforms. Through PIR technology, it is ensured that when a user submits a query request to a database on a server, the query is completed without the leakage of the user's specific query information. In other words, throughout the entire query process, the server is unaware of the user's specific query information or the data items retrieved based on that query.

[0073] Private Sey Intersection (PSI): PSI allows multiple parties holding their own sets to jointly compute the intersection of their sets. At the end of the computation, each participating party can only obtain the correct intersection and will not obtain any information from the other party's set outside the intersection.

[0074] Oblivious Transfer (OT): The problem solved by the OT protocol can be abstractly described as follows: A has n messages {m1,…,mn}, and B wants to know one of them, message mi; by executing the OT protocol, B can correctly obtain the message mi it wants to know, but cannot obtain the other n-1 messages, while A cannot know which message B obtained.

[0075] Currently, the following two solutions are provided in related technologies to achieve privacy information retrieval:

[0076] Option 1: The electronic device sends a data retrieval request to the server. The server then sends all the data in the encrypted database to the electronic device. The electronic device decrypts all the data to determine the private data it actually wants to obtain.

[0077] Please see Figure 1 As shown, assume an electronic device has a keyword `keyword` and a private key `a`, and a server has a keyword set `{keyword1, ...,keywordn}` and a private key `b`. The electronic device can send a request to the server to obtain the keyword set `sey` and the encrypted keyword. The server can then re-encrypt the encrypted keyword and return the re-encrypted keyword and the keyword set `sey` to the electronic device.

[0078] from Figure 1 As can be seen, the electronic device needs to perform n+1 exponentiation operations to compare the words in the keyword set sey after secondary encryption. Clearly, the computational cost of the electronic device is directly proportional to the amount of data on the server; the more data on the server, the greater the computational cost of the electronic device.

[0079] Option 2: The electronic device randomly generates a preset number of fake keywords and sends these fake keywords along with the target search keywords to the server. The server searches for the fake keywords and the target keywords, encrypts the search results, and sends them to the electronic device. The electronic device decrypts the obtained information to obtain the target search keyword information.

[0080] Please see Figure 2 The electronic device randomly generates m-1 fake keywords, which together with the keyword form m keywords. The m keywords are sent to the server, and the server retrieves m results. It executes the OT protocol to encrypt the m results and sends the m encrypted results to the electronic device. The electronic device executes the OT protocol and correctly decrypts the search results corresponding to the keyword.

[0081] However, due to the different distribution characteristics of different data, if fake keywords are generated randomly, it is very likely that noisy data will be generated, which can be easily identified by the server, thus exposing the user's real query keywords.

[0082] In view of this, the present invention provides a keyword privacy information retrieval method. Through this method, an electronic device can receive a user-input keyword for retrieving privacy information and a first number M of obfuscating words used to obfuscate the keyword, as well as a total data volume X sent by a server; based on the first number M, the total data volume X is grouped to obtain a second number N; and based on the second number N and the keyword to be searched, target filtering information is obtained; the second number N and the target filtering information are sent to the server, and a target index list is received from the server, wherein the target index list includes Y encrypted indexes; the Y encrypted indexes are encrypted indexes corresponding to privacy information in the server's database, and the Y encrypted indexes are obtained by substituting the second number N and the target filtering information into preset filtering conditions. It can be seen that in this embodiment of the invention, only one set of data is compared, greatly reducing the computational overhead of the electronic device. Furthermore, the electronic device can initiate a simultaneous multi-selection unintentional transmission protocol request to the server, and based on the target index list, obtain a subset containing privacy information corresponding to the target index list from the server, and obtain the privacy information corresponding to the keyword to be searched from the subset. In this way, by sending a second quantity N and target filtering information to the server to determine the target index list, the present invention prevents both parties from leaking too much information during the data filtering stage, thereby improving the privacy and security of the data transmission process.

[0083] After introducing the design concept of the embodiments of the present invention, the following is a brief introduction to the application scenarios applicable to the technical solutions of the embodiments of the present invention. It should be noted that the application scenarios described in the embodiments of the present invention are for the purpose of more clearly illustrating the technical solutions of the embodiments of the present invention, and do not constitute a limitation on the technical solutions provided by the embodiments of the present invention. As those skilled in the art will know, with the emergence of new application scenarios, the technical solutions provided by the embodiments of the present invention are also applicable to similar technical problems.

[0084] Please see Figure 3 The diagram shown is a scenario in which the invention can be applied. The scenario includes a terminal device 310 and a server 320. The terminal device 310 and the server 320 can communicate with each other through a communication network.

[0085] In this embodiment, the terminal device 310 is an electronic device used by a user, such as a personal computer, mobile phone, tablet computer, laptop, e-book reader, smart home device, shopping mall turnstile, etc. Each terminal device 310 can communicate with the server 320 via a communication network. In one optional implementation, the communication network can be a wired network or a wireless network. Therefore, the terminal devices 310 and the server 320 can be directly or indirectly connected via wired or wireless communication. This embodiment does not impose specific limitations on this connection.

[0086] Server 320 can be a standalone physical server 320, an edge device 320 in the field of cloud computing, or a cloud server 320 that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud storage, cloud functions, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.

[0087] It should be noted that the keyword privacy information retrieval method in this embodiment can be executed by the server 320 or the terminal device 310 alone, or by both the server 320 and the terminal device 310. When both the terminal device 310 and the server 320 execute the method, for example, the terminal device 310 sends a data volume acquisition request to the server 320, and the server 320 sends a data volume X to the terminal device 310. Further, the terminal device determines a second quantity N of data groups based on the data volume X and a first quantity M; and obtains target filtering information based on the second quantity N and the keyword to be searched; the target filtering information is used to filter data from the server's database. Then, the terminal device 310 can send the second quantity N and the target filtering information to the server 320, and the server 320 can substitute the second quantity N and the target filtering information into preset filtering conditions to obtain at least one encrypted index, construct a target index list based on the at least one encrypted index, and send the target index list to the terminal device 310. Furthermore, terminal device 310 sends an acquisition request to the server, which carries a target index list. Server 320 searches its internal database for a subset of data corresponding to the target index list in the acquisition request and sends this subset to terminal device 310. Thus, terminal device 310 can obtain the privacy information corresponding to the keyword to be searched from the subset of data corresponding to the target index list. The following explanation primarily uses the example of the terminal device executing the process alone, and no specific limitations are made.

[0088] It should be noted that within the scope of the keywords and their corresponding privacy information collected, stored, or used in the above embodiments, it should be understood that the use of such information must comply with all applicable laws concerning information protection. Furthermore, the collection, storage, and use of the aforementioned keywords and their corresponding privacy information require the user's consent to such activities, for example, by obtaining the user's consent through prompts such as "Do you agree to the relevant agreements of XX platform?" Moreover, the storage and use of keywords and their corresponding privacy information can reflect the information in an appropriate and secure manner, for example, by processing the keywords and their corresponding privacy information using various encryption and anonymization technologies for particularly sensitive information, and then using the processed data.

[0089] To further illustrate the keyword privacy information retrieval method provided in the embodiments of the present invention, a detailed description is provided below in conjunction with the accompanying drawings and specific implementation methods. Although the embodiments of the present invention provide method operation steps as shown in the following embodiments or drawings, more or fewer operation steps may be included in the method based on conventional or non-inventive methods. In steps where there is no logically necessary causal relationship, the execution order of these steps is not limited to the execution order provided in the embodiments of the present invention. In actual processing or device execution, the method may be executed sequentially according to the embodiments or drawings, or in parallel (e.g., in a parallel processor or multi-threaded processing application environment).

[0090] The following combination Figure 4 The flowchart shown illustrates the keyword privacy information retrieval method in this embodiment of the invention. Figure 4 The steps shown can be derived from, for example: Figure 3 The electronic device 310 shown performs this action.

[0091] Step 401: Receive the user-input keywords for retrieving privacy information and the first number M of obfuscating words for obfuscating the keywords, as well as the total amount of data X sent by the server.

[0092] In this embodiment of the invention, the electronic device can receive a user-input keyword for retrieving privacy information and a first number M of obfuscating words for obfuscating the keyword, where M is a positive integer.

[0093] In one embodiment, a user can directly input (e.g., type or voice) the search keywords for retrieving privacy information and a first number M of obfuscating words to obscure the search keywords on the corresponding information search platform. For example, a user can voice input "stock A" and "10" on the corresponding information search platform.

[0094] In another embodiment, a user can select the target keyword for retrieving privacy information and a first number M of obfuscating words to obfuscate the target keyword on the corresponding information search platform. For example, on the corresponding information search platform, the user first selects "Fund A" from multiple candidate keywords, and then selects "20" from multiple candidate quantities.

[0095] In this embodiment of the invention, after receiving the user's input of the search keywords for retrieving privacy information and a first number M of obfuscating words for obscuring the search keywords, the electronic device can send a retrieval request to the server. This retrieval request is used to retrieve the total amount of data stored in the database of the server. Then, the server responds to the retrieval request and sends the total amount of data X to the electronic device.

[0096] Step 402: Based on the first quantity M, group the total data X to obtain the second quantity N; and based on the second quantity N and the keyword to be searched, obtain the target filtering information; wherein, the target filtering information is used to filter data from the database of the server; the second quantity N is used to represent the number of groups obtained after grouping the total data.

[0097] In this embodiment of the invention, after the electronic device obtains the total data X and the first quantity M, it can group the total data X according to the first quantity M to obtain the second quantity N, where N is a positive integer.

[0098] In one embodiment, the electronic device can substitute the total data amount X and the first quantity M into the following formula:

[0099] N = X / / M (Formula 1)

[0100] Here, / / is used to represent integer division operations.

[0101] As can be seen, in this embodiment of the invention, the electronic device can first divide the total amount of data X stored in the server into multiple data groups. For example, if the total amount of data X stored in the server is 50 records and the first quantity M is 5, then the second quantity N of the data groups can be determined to be 10, that is, 10 data groups are obtained.

[0102] In this embodiment of the invention, after the electronic device obtains the second quantity N, it can obtain target filtering information based on the second quantity N and the keyword to be searched.

[0103] Optionally, the electronic device can substitute the second quantity N and the keyword to be searched into the following formula:

[0104] result=H(keyword)%N(Formula 2)

[0105] Wherein, result is used to represent the target filtering information, H(keyword) is used to represent the hash operation performed on the keyword to be searched and the hash result is rounded down, keyword is used to represent the keyword to be searched, and % is used to represent the modulo operation.

[0106] In this embodiment of the invention, after obtaining the target filtering information, the electronic device can generate a random number; according to a preset encryption algorithm, combining the random number and the public key used by the server to encrypt data, the keyword to be searched is encrypted to obtain an encrypted keyword to be searched, and the encrypted keyword to be searched is sent to the server so that the server can perform a second encryption on the encrypted keyword to obtain a second-encrypted keyword to be searched. The preset encryption algorithm is, for example, the RSA algorithm. This prevents the server from knowing the keyword to be searched.

[0107] Step 403: Send the second quantity N and the target filtering information to the server, and receive the target index list fed back by the server. The target index list includes Y ciphertext indexes. The Y ciphertext indexes are ciphertext indexes corresponding to the privacy information in the server's database. The Y ciphertext indexes are obtained by substituting the second quantity N and the target filtering information into the preset filtering conditions.

[0108] In this embodiment of the invention, the electronic device sends the second quantity N and the target filtering information to the server. The server can then substitute the received second quantity N and the target filtering information into the preset filtering conditions to obtain the results. From the server's database, it filters the ciphertext indexes corresponding to the privacy information that meet the aforementioned results, thereby obtaining Y ciphertext indexes, where Y is a positive integer. Based on the Y ciphertext indexes, a target index list is constructed and sent to the electronic device.

[0109] In one embodiment, the aforementioned preset filtering condition can be: H(sub_keywordi)%N = result. Here, result represents the target filtering information, H(sub_keywordi) represents the hash operation performed on the preprocessed data in the database and the hash result rounded down, sub_keywordi represents the preprocessed data in the database, and % represents the modulo operation.

[0110] In one embodiment, the server can generate RSA algorithm public and private keys that meet a certain security strength, where the public key is represented by e, the private key by d, and the public parameter by n. Assume the server has a dataset {(keyword1, plaintext1), ..., (keywordx, plaintextx)}, meaning the server has x data entries of type (keywordi, plaintexti), where plaintexti is plaintext. Then, the server can create a ciphertext index en_indexi for each keyword keywordi, calculated as en_indexi = H((H(keywordi))d%n), where H() represents a hash function that returns an integer. Thus, the server can obtain the preprocessed dataset, and this preprocessed dataset db can be represented as {(keyword1, en_index1, plaintext1), ..., (keywordx, en_indexx, plaintextx)}. Therefore, when the server receives the second quantity N and target filtering information sent by the electronic device, it can filter out at least one ciphertext index from the preprocessed dataset and construct a target index list based on the selected ciphertext index.

[0111] For example, see Figure 5 The electronic device can obtain the full dataset in the server, i.e., the total amount of data X stored in the aforementioned database. The electronic device can also receive the user's input of the search keyword and the first quantity M, and then perform calculations to obtain the second quantity N and the target filtering information result. The second quantity N and the target filtering information result are then sent to the server. The server filters the subset of the full dataset that meets the preset filtering conditions, determines the target index list corresponding to the subset of the dataset, and feeds back the target index list to the electronic device.

[0112] Obviously, in this embodiment of the invention, the server only feeds back the target index list corresponding to a set of subsets of data to the electronic device. Compared with Scheme 1 in the related technology, the electronic device can compare only a set of subsets of data, which greatly reduces the computational overhead.

[0113] Step 404: Based on the target index list, obtain the subset containing privacy information corresponding to the target index list from the server, and obtain the privacy information corresponding to the keyword to be searched from the subset.

[0114] In this embodiment of the invention, the electronic device can perform deblinding processing on the secondary encrypted keywords to be searched fed back by the server based on random numbers to obtain a target sub-index; the target sub-index has the same format as the ciphertext index in the database; in the target index list, an index that matches the target sub-index is searched to obtain a matching index, and the sequence number of the matching index in the target index list is used as the target index; from the subset of data corresponding to the target index list, the subset of data corresponding to the target index is searched as the privacy information corresponding to the keyword to be searched.

[0115] To more clearly illustrate the keyword privacy information retrieval method provided in the embodiments of the present invention, the entire process is explained below with an example.

[0116] Please see Figure 6 In one embodiment, the keyword privacy information retrieval method provided by the present invention can be roughly divided into four stages: the initiation stage, the data filtering stage, the index determination stage, and the privacy information retrieval stage.

[0117] During the startup phase, assume the server has a dataset {(keyword1, plaintext1), ..., (keywordx, plaintextx plaintext)}, containing x data entries of class (keywordi, plaintexti), where plaintexti is plaintext. The server can preprocess the data in the dataset using, but is not limited to, the following steps:

[0118] Step 601: The server generates RSA algorithm public and private keys that meet a certain security level, where the public key is e, the private key is d, and the public parameter is n.

[0119] Step 602: The server creates a ciphertext index en_indexi for each keyword keywordi, calculated as en_indexi = H((H(keywordi))d%n), where H() represents a hash function that returns an integer. The preprocessed dataset db can be represented as {(keyword1,en_index1,plaintext1),…,(keywordx,en_indexx,plaintextx)}.

[0120] In this embodiment of the invention, the server's private key is used for data preprocessing. The preprocessed data can be used permanently, and all users can use this preprocessed data for information retrieval. This allows for offline preprocessing when the server starts, thereby saving computational overhead during online querying.

[0121] Step 603: The server exposes parameters e, n, and x, where x represents the total amount of data X in the dataset db.

[0122] During the data filtering phase, the electronic device can obtain the publicly available parameters e, n, and x from the server and can perform the following steps:

[0123] Step 604: The electronic device receives the user's input keyword `keyword_user` and the expected subset of data in the server database `expect_num`. Here, `expect_num` can be understood as the aforementioned first quantity `M`.

[0124] Step 605: The electronic device calculates the modulus MOD = x / / expect_num, where the symbol " / / " indicates "rounding down the division result". Here, the modulus MOD can be understood as the aforementioned second quantity N.

[0125] Step 606: The electronic device calculates the modulo result re_mod = H(keyword_user)%MOD. The modulo result can be understood as the aforementioned target filtering information.

[0126] Step 607: The electronic device generates a random integer r, and uses r and public parameters e and n to encrypt the keyword to be searched, keyword_user, with the encryption result being en_user.

[0127] Optionally, the keywords to be searched can be encrypted using the following formula three:

[0128] en_user=(re%n)*H(keyword_user)%n(Formula 3);

[0129] Where en_user represents the encrypted keyword to be searched, r represents a random integer, e represents the public key, n represents the public parameter, keyword_user represents the keyword to be searched, % represents the modulo operation, and H(keyword_user) represents the hash operation performed on the keyword to be searched and the hash result is rounded down.

[0130] Furthermore, the electronic device can send the calculation results of MOD, re_mod, and en_user to the server. After receiving these three calculation results, the server can perform the following steps:

[0131] Step 608: Filter the sub-datasets sub_db = {(sub_keyword1, sub_en_index1, sub_plaintext1), ..., (sub_keywordy, sub_en_indexy, sub_plaintexty)} from the preprocessed dataset db. Any data (sub_keywordi, sub_en_indexi, sub_plaintexti) in the sub-dataset sub_db satisfies the condition H(sub_keywordi)%MOD = re_mod.

[0132] During the index determination phase, the server can perform the following steps:

[0133] Step 609: The server constructs a target index list list_en_index = {sub_en_index1,…,sub_en_indexy} based on the encrypted indexes in the subset sub_db selected from the preprocessed dataset.

[0134] Step 610: The server uses the data en_user received in step 607 and the RSA private key d and public parameter n from step 601 to perform secondary encryption on en_user, and the encryption result is re_en_user.

[0135] Optionally, en_user can be encrypted a second time using the following formula:

[0136] re_en_user=en_userd%n (Formula 4);

[0137] Here, re_en_user represents the keyword to be searched after secondary encryption, en_user represents the keyword to be searched after encryption, d represents the private key, n represents the public parameter, and % represents the modulo operation.

[0138] Furthermore, the server can send `re_en_user` and `list_en_index` from step 609 to the electronic device. Upon receiving `re_en_user` and `list_en_index`, the electronic device can perform the following steps:

[0139] Step 611: The electronic device uses the random integer r from step 607 to perform deblinding on re_en_user to obtain en_index.

[0140] Optionally, the electronic device can unblind re_en_user using the following formula five:

[0141] en_index = H(re_en_user*r-1%n) (Formula 5);

[0142] Here, en_index is used to represent the target sub-index, re_en_user is used to represent the keyword to be searched after secondary encryption, r is used to represent a random integer, n is used to represent common parameters, % is used to represent the modulo operation, and H() is used to represent performing a hash operation and rounding the hash result.

[0143] Step 612: The electronic device finds data that is equal to en_index in the list_en_index list. The index of this data in the list_en_index list is the target index of the data to be searched in the subdataset sub_db.

[0144] During the privacy information query phase, the server obtains the sub-dataset sub_db = {(sub_keyword1,sub_en_index1,sub_plaintext1),…,(sub_keywordy,sub_en_indexy,sub_plaintexty)}; the electronic device obtains the index number of the keyword to be queried, keyword_user, in the sub-dataset sub_db, i.e., the target index.

[0145] During the privacy information query phase, the electronic device uses a known index, and the server uses a known subset of data (sub_db). Both parties jointly execute a multiple-choice-one-option unintentional transmission (OT) protocol. The electronic device then obtains the final search result, which is the privacy information corresponding to the searched keyword. In this way, the server can only guess that the electronic device's search result is one item in the subset of data (sub_db), but cannot determine which specific item, thus satisfying the user's privacy query needs.

[0146] As can be seen, the keyword privacy information retrieval method provided in this embodiment of the invention eliminates the need for electronic devices to generate fake keywords, thus avoiding the problem in related technologies where "fake keywords are poorly randomized and easily identified by the server," resulting in better privacy protection. Furthermore, the solution provided in this embodiment of the invention also supports users defining a first number M of obfuscated words to obfuscate the searched keywords, allowing users to freely control the confidentiality strength of the retrieved privacy information according to their own security needs and bandwidth limitations.

[0147] Based on the same inventive concept, embodiments of the present invention provide a keyword privacy information retrieval device, which can realize the functions corresponding to the aforementioned keyword privacy information retrieval method. This keyword privacy information retrieval device can be a hardware structure, a software module, or a hardware structure plus a software module. The keyword privacy information retrieval device can be implemented by a chip system, which can consist of chips or include chips and other discrete components. Please refer to... Figure 7 As shown, the keyword privacy information retrieval device includes:

[0148] The receiving unit 701 is configured to receive the search keywords for retrieving privacy information input by the user, the first number M of obfuscating words used to obfuscate the search keywords, and the total amount of data X sent by the server.

[0149] The first processing unit 702 is configured to group the total data X according to the first quantity M to obtain a second quantity N; and to obtain target filtering information according to the second quantity N and the keyword to be searched; wherein the target filtering information is used to filter data from the database of the server; the second quantity N is used to characterize the number of groups obtained after grouping the total data X;

[0150] The second processing unit 703 is used to send the second quantity N and the target filtering information to the server, and receive the target index list fed back by the server, wherein the target index list includes Y encrypted indexes; the Y encrypted indexes are encrypted indexes corresponding to privacy information in the server's database, and the Y encrypted indexes are obtained by substituting the second quantity N and the target filtering information into preset filtering conditions;

[0151] The obtaining unit 704 is configured to obtain a subset containing privacy information corresponding to the target index list from the server according to the target index list, and obtain the privacy information corresponding to the keyword to be searched from the subset.

[0152] In one possible implementation, the first processing unit 702 is specifically used for:

[0153] Substitute the second quantity N and the keyword to be searched into the following formula:

[0154] result = H(keyword)%N

[0155] Wherein, result is used to represent the target filtering information, H(keyword) is used to represent the hash operation performed on the keyword to be searched and the hash result is rounded down, keyword is used to represent the keyword to be searched, and % is used to represent the modulo operation.

[0156] In one possible implementation, the apparatus further includes a third processing unit for:

[0157] Generate random numbers;

[0158] According to the preset encryption algorithm, the random number and the public key used by the server to encrypt data are used to encrypt the keyword to be searched, thereby obtaining the encrypted keyword to be searched. The encrypted keyword to be searched is then sent to the server so that the server can perform secondary encryption on the encrypted keyword to be searched, thereby obtaining the secondary encrypted keyword to be searched.

[0159] In one possible implementation, the obtaining unit 704 is used for:

[0160] Based on the random number, the secondary encrypted keywords returned by the server are deblinded to obtain a target sub-index; the target sub-index has the same format as the ciphertext index in the database.

[0161] In the target index list, find the index that matches the target sub-index to obtain the matching index, and use the sequence number of the matching index in the target index list as the target index;

[0162] From the sub-dataset corresponding to the target index list, find the sub-data corresponding to the target index, and use it as the privacy information corresponding to the keyword to be searched.

[0163] In one possible implementation, the first processing unit 702 is specifically used for:

[0164] Substitute the total amount of data and the first quantity M into the following formula:

[0165] N = X / / M

[0166] Where X represents the total amount of data, and / / represents the integer division operation.

[0167] All relevant content of each step involved in the aforementioned embodiments of the keyword privacy information retrieval method can be referenced to the functional description of the corresponding functional module of the keyword privacy information retrieval device in the embodiments of the present invention, and will not be repeated here.

[0168] The module division in this embodiment of the invention is illustrative and represents only one logical functional division. In actual implementation, other division methods may be used. Furthermore, the functional modules in the various embodiments of the invention can be integrated into a single controller, exist as separate physical entities, or be integrated into a single module. The integrated modules described above can be implemented in hardware or as software functional modules.

[0169] Based on the same inventive concept, embodiments of the present invention provide an electronic device, please refer to... Figure 8 As shown, the electronic device includes at least one processor 801 and a memory 802 connected to the at least one processor. In this embodiment of the invention, the specific connection medium between the processor 801 and the memory 802 is not limited. Figure 8 Taking the connection between processor 801 and memory 802 via bus 800 as an example, bus 800 in... Figure 8 The connections between other components are indicated by thick lines and are for illustrative purposes only, not as limiting information. The 800 bus can be divided into address bus, data bus, control bus, etc., for ease of representation. Figure 8 The symbol is represented by a single thick line, but this does not indicate that there is only one bus or one type of bus. Furthermore, the keyword privacy information retrieval device also includes a communication interface 803 for receiving or sending data.

[0170] In this embodiment of the invention, the memory 802 stores instructions that can be executed by at least one processor 801. By executing the instructions stored in the memory 802, at least one processor 801 can perform the steps included in the aforementioned keyword privacy information retrieval method.

[0171] The processor 801 is the control center of the electronic device. It can connect to various parts of the electronic device through various interfaces and lines. By running or executing instructions stored in the memory 802 and calling data stored in the memory 802, it can monitor the various functions and data processing of the electronic device as a whole.

[0172] Optionally, processor 801 may include one or more processing units. Processor 801 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the modem processor may not be integrated into processor 801. In some embodiments, processor 801 and memory 802 may be implemented on the same chip; in some embodiments, they may be implemented separately on independent chips.

[0173] Processor 801 can be a general-purpose processor, such as a central processing unit (CPU), digital signal processor, application-specific integrated circuit, field-programmable gate array or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component, capable of implementing or executing the methods, steps, and logic block diagrams disclosed in the embodiments of this invention. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this invention can be directly manifested as being executed by a hardware processor, or executed by a combination of hardware and software modules within the processor.

[0174] Memory 802, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. Memory 802 may include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type memory, random access memory (RAM), static random access memory (SRAM), programmable read-only memory (PROM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), magnetic storage, magnetic disk, optical disk, etc. Memory 802 can be any other medium capable of carrying or storing desired program code in the form of instructions or data structures that can be accessed by a computer, but is not limited thereto. In embodiments of the present invention, memory 802 may also be a circuit or any other device capable of implementing storage functions for storing program instructions and / or data.

[0175] By designing and programming the processor 801, the code corresponding to the keyword privacy information retrieval method described in the foregoing embodiments can be embedded into the chip, so that the chip can execute the steps of the aforementioned keyword privacy information retrieval method when running. How to design and program the processor 801 is a well-known technique to those skilled in the art, and will not be described in detail here.

[0176] Based on the same inventive concept, embodiments of the present invention also provide a storage medium storing computer instructions that, when executed on a computer, cause the computer to perform the steps of the aforementioned keyword privacy information retrieval method.

[0177] In some possible implementations, various aspects of the keyword privacy information retrieval method provided by the present invention can also be implemented in the form of a program product, which includes program code. When the program product is run on a control electronic device, the program code is used to cause the control electronic device to perform the steps in the keyword privacy information retrieval method according to various exemplary embodiments of the present invention described above.

[0178] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.

[0179] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0180] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0181] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0182] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Therefore, if these modifications and variations fall within the scope of the claims of this invention and their equivalents, this invention also intends to include these modifications and variations.

Claims

1. A keyword privacy information retrieval method, characterized in that, The method includes: The system receives user-input keywords for retrieving privacy information and a first number M of obfuscating words for obfuscating the keywords, as well as the total amount of data X sent by the server. Based on the first quantity M, the total data X is grouped to obtain a second quantity N; and based on the second quantity N and the keyword to be searched, target filtering information is obtained; wherein, the target filtering information is used to filter data from the database of the server; the second quantity N is used to characterize the number of groups obtained after grouping the total data X; The second quantity N and the target filtering information are sent to the server, and the target index list fed back by the server is received. The target index list includes Y encrypted indexes. The Y encrypted indexes are encrypted indexes corresponding to privacy information in the server's database. The Y encrypted indexes are obtained by substituting the second quantity N and the target filtering information into preset filtering conditions. Based on the target index list, a subset containing privacy information corresponding to the target index list is obtained from the server, and the privacy information corresponding to the keyword to be searched is obtained from the subset.

2. The method as described in claim 1, characterized in that, Based on the second quantity N and the keyword to be searched, target filtering information is obtained, including: Substitute the second quantity N and the keyword to be searched into the following formula: result = H(keyword)%N Wherein, result is used to represent the target filtering information, H(keyword) is used to represent the hash operation performed on the keyword to be searched and the hash result is rounded down, keyword is used to represent the keyword to be searched, and % is used to represent the modulo operation.

3. The method as described in claim 2, characterized in that, After obtaining the target filtering information, the method further includes: Generate random numbers; According to the preset encryption algorithm, the random number and the public key used by the server to encrypt data are used to encrypt the keyword to be searched, thereby obtaining the encrypted keyword to be searched. The encrypted keyword to be searched is then sent to the server so that the server can perform secondary encryption on the encrypted keyword to be searched, thereby obtaining the secondary encrypted keyword to be searched.

4. The method as described in claim 3, characterized in that, Based on the target index list, a subset containing privacy information corresponding to the target index list is obtained from the server, and the privacy information corresponding to the keyword to be searched is obtained from the subset, including: Based on the random number, the secondary encrypted keywords returned by the server are deblinded to obtain a target sub-index; the target sub-index has the same format as the ciphertext index in the database. In the target index list, find the index that matches the target sub-index to obtain the matching index, and use the sequence number of the matching index in the target index list as the target index; From the subset of data, find the subset of data corresponding to the target index, and use it as the privacy information corresponding to the keyword to be searched.

5. The method according to any one of claims 1-4, characterized in that, Based on the first quantity M, the total data X is grouped to obtain a second quantity N, including: Substitute the total data X and the first quantity into the following formula: N = X / / M Here, / / is used to represent integer division operations.

6. A keyword privacy information retrieval device, characterized in that, The device includes: The receiving unit is configured to receive the user-input keywords for retrieving privacy information and a first number M of obfuscating words for obfuscating the keywords, as well as the total amount of data X sent by the server; The first processing unit is configured to group the total data X according to the first quantity M to obtain a second quantity N; and to obtain target filtering information according to the second quantity N and the keyword to be searched; wherein the target filtering information is used to filter data from the database of the server; the second quantity N is used to characterize the number of groups obtained after grouping the total data X; The second processing unit is configured to send the second quantity N and the target filtering information to the server, and receive the target index list fed back by the server, wherein the target index list includes Y encrypted indexes; the Y encrypted indexes are encrypted indexes corresponding to privacy information in the server's database, and the Y encrypted indexes are obtained by substituting the second quantity N and the target filtering information into preset filtering conditions; The obtaining unit is configured to obtain a subset containing privacy information corresponding to the target index list from the server based on the target index list, and obtain the privacy information corresponding to the keyword to be searched from the subset.

7. The apparatus as claimed in claim 6, characterized in that, The first processing unit is specifically used for: Substitute the second quantity N and the keyword to be searched into the following formula: result = H(keyword)%N Wherein, result is used to represent the target filtering information, H(keyword) is used to represent the hash operation performed on the keyword to be searched and the hash result is rounded down, keyword is used to represent the keyword to be searched, and % is used to represent the modulo operation.

8. The apparatus as claimed in claim 7, characterized in that, The device further includes a third processing unit for: Generate random numbers; According to the preset encryption algorithm, the random number and the public key used by the server to encrypt data are used to encrypt the keyword to be searched, thereby obtaining the encrypted keyword to be searched. The encrypted keyword to be searched is then sent to the server so that the server can perform secondary encryption on the encrypted keyword to be searched, thereby obtaining the secondary encrypted keyword to be searched.

9. An electronic device, characterized in that, The electronic device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the steps of the keyword privacy information retrieval method as described in any one of claims 1 to 5.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the keyword privacy information retrieval method as described in any one of claims 1 to 5.