A privacy-preserving vector database query method
By performing linear transformations and random permutations on a vector database and sharing secrets, combined with semi-trusted hardware for querying, the problem of low efficiency and strong trust assumptions in privacy protection in high-dimensional/large-scale databases is solved, achieving efficient privacy protection and improved query efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CSG EHV POWER TRANSMISSION
- Filing Date
- 2026-04-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to achieve efficient and accurate privacy protection in high-dimensional/large-scale vector database queries, and traditional methods suffer from high computational overhead, strong trust assumptions, and weak index support.
The vector data is linearly transformed and randomly permuted using secret sharing technology, stored on multiple servers, and semi-trusted hardware is introduced for distance calculation. The labels are recovered using one-hot secret share method, and the query results are reconstructed through secret sharing.
While ensuring computational efficiency, it achieves end-to-end privacy protection, improves query efficiency, adapts to large-scale and high-dimensional scenarios, and avoids the computational overhead and hardware resource bottlenecks of pure encryption methods.
Smart Images

Figure CN122240891A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of information security technology, and in particular relates to a privacy-preserving vector database query method. Background Technology
[0002] With the continuous development of privacy protection technologies, many applications involving sensitive data (such as recommendation systems and personalized search) need to store and query vector data. Traditional database query methods often expose data in plaintext, making them vulnerable to various attacks and leading to user privacy leaks. To address this issue, secret sharing technology has been proposed to protect data privacy; however, how to efficiently and accurately achieve privacy protection without leaking sensitive data during query operations remains a challenge. Vector databases are widely used in various fields, such as recommendation systems and image search. In these applications, the goal of the query is to find the most similar vector in the database based on the query vector input by the user. In existing technologies, vector database queries largely rely on server-side plaintext operations on the data, which carries the risk of data leakage.
[0003] In many privacy-preserving vector retrieval studies, schemes tend to employ homomorphic encryption (e.g., Fully Homomorphic Encryption, FHE) or comparable encryption techniques, enabling matching or distance calculation of database vectors or query vectors in an encrypted state. For example, in "A Note on Efficient Privacy-Preserving Similarity Search for Encrypted Vectors," the authors point out that the traditional FHE method is extremely inefficient in large-scale real-time retrieval scenarios, thus turning to Additive Homomorphic Encryption (AHE), which only supports addition and scalar multiplication, to calculate inner product similarity. While such schemes offer strong privacy protection, their main drawbacks are: the need for encrypted computation under homomorphic encryption, high computational overhead, weak index support, and difficulty in scaling to high-dimensional or ultra-large-scale databases.
[0004] Another approach relies on hardware security modules or trusted execution environments (such as Intel SGX) to ensure that plaintext or partially plaintext computations are performed within protected hardware, thereby reducing the risk of plaintext exposure while maintaining query efficiency. For example, FedVSE proposes using TEEs to coordinate query processing in federated vector database scenarios, supporting KNN and hybrid queries. However, this approach also has significant drawbacks: firstly, it relies heavily on hardware trust assumptions (hardware may be subject to side-channel attacks or damage); secondly, hardware resources (memory, cache, I / O) may become a performance bottleneck, and traditional index structures still struggle to fully leverage their advantages in large-scale / high-dimensional scenarios.
[0005] Another approach employs secret sharing or secure multi-party computation (SMC) to distribute secret data shares among multiple parties, enabling them to collaborate on similarity calculations or nearest neighbor queries, thus preventing any single party from seeing the complete plaintext. This type of approach excels in privacy protection, but it also has significant limitations, such as high communication interactions, high computational and synchronization costs, difficulty in supporting large-scale dynamic data updates, and difficulty in optimizing index structures. Therefore, while these existing methods each have their advantages, comprehensive optimization remains challenging in dimensions such as "high-dimensional / large-scale databases," "query efficiency," "index support," and "trust assumptions." Summary of the Invention
[0006] The purpose of this application is to address the problems existing in the prior art by providing a privacy-preserving vector database query method.
[0007] According to a first aspect of the embodiments of this application, a privacy-preserving vector database query method is provided, applied to any server, wherein the server stores a secret share of transformed vector data and a corresponding secret share of tags secretly shared by a database, the method comprising: Obtain the query vector secret share, and perform a linear transformation on the query vector secret share to obtain the transformed query vector secret share; The secret share of the transformation vector data and the secret share of the transformation query vector are sent to the semi-trusted hardware so that the semi-trusted hardware can perform nearest neighbor search to obtain the nearest neighbor vector and the corresponding index. The secret share of the nearest neighbor vector and the one-hot vector corresponding to the index is then shared with each server. The one-hot vector secret share and the nearest neighbor vector secret share sent by the semi-trusted hardware are obtained. The one-hot vector secret share and the tag secret share are calculated by dot product to obtain the query result tag secret share. The nearest neighbor vector secret share is subjected to the inverse transformation of the linear transformation to obtain the query result vector secret share. The query result tag secret share and the query result vector secret share are sent to the user terminal so that the user terminal can reconstruct the query result vector and the query result tag.
[0008] Furthermore, the transformation vector data secret share and the corresponding tag secret share are obtained in the following manner: The database acquires the original transformed vector data secret share obtained by generating a secret share from the held vector data and performing a linear transformation, as well as the original label secret share obtained from the corresponding held label secret share. The original transformed vector data secret share... ,in For the vector data held by the database side The generated secret share, It is a non-zero scalar. This is the offset; The original transformed vector data secret share and the original label secret share are reordered using the same permutation to obtain the transformed vector data secret share and the corresponding label secret share.
[0009] Furthermore, the transformed query vector secret share ,in Query vector held by the server The secret share, It is a non-zero scalar. This is the offset.
[0010] Furthermore, in the one-hot vector, the index corresponding to the nearest neighbor vector is set to 1, and the values at the other positions are 0, i.e. ,in This is the index of the nearest neighbor vector.
[0011] Furthermore, the secret share of the query result label is: , Where n is the number of tags. This is the secret share of the reordered tags. This is the secret share of the one-hot vector.
[0012] Furthermore, the secret share of the query result vector is: , in, Nearest neighbor vector The secret share, It is a non-zero scalar. This is the offset.
[0013] According to a second aspect of the embodiments of this application, a privacy-preserving vector database query system is provided, comprising: On the database side, the database side is used to generate secret shares for the held vector data and corresponding tags, perform linear transformation on the secret shares of the vector data, and share them with the server side. The user terminal is used to secretly share the user's query vector with the server. The server side consists of three servers, one of which has semi-trusted hardware deployed on it. Each server is used to obtain the transformed vector data secret share and the corresponding label secret share, and to reorder the transformed vector data secret share and the corresponding label secret share using the same permutation; the query vector secret share undergoes the same linear transformation as the transformed vector secret share to obtain the transformed query vector secret share; the transformed vector data secret share and the transformed query vector secret share are sent to semi-trusted hardware to obtain the one-hot vector secret share and the nearest neighbor vector secret share sent by the semi-trusted hardware; the one-hot vector secret share and the label secret share are calculated by dot product to obtain the query result label secret share; the nearest neighbor vector secret share is subjected to the inverse transformation of the linear transformation to obtain the query result vector secret share; the query result label secret share and the query result vector secret share are sent to the user terminal so that the user terminal can reconstruct the query result vector and the query result label; Semi-trusted hardware is used to perform nearest neighbor search, obtain the nearest neighbor vector and its corresponding index, and secretly share the nearest neighbor vector and the one-hot vector corresponding to the index with each server.
[0014] According to a third aspect of the embodiments of this application, a computer program product is provided, including a computer program / instructions that, when executed by a processor, implement the method described in the first aspect.
[0015] According to a fourth aspect of the embodiments of this application, an electronic device is provided, comprising: One or more processors; Memory, used to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors perform the method as described in the first aspect.
[0016] According to a fifth aspect of the embodiments of this application, a computer-readable storage medium is provided that stores computer instructions thereon, which, when executed by a processor, implement the steps of the method as described in the first aspect.
[0017] The technical solutions provided by the embodiments of this application may include the following beneficial effects: As described in the above embodiments, this application first performs a uniform linear transformation and random permutation π on all data in the vector database, so that the trusted hardware sees the transformed and shuffled data, avoiding the plaintext leakage of the original vectors and their order. Then, both vectors and tags are stored in secret sharing across multiple servers, making it impossible for any single server to restore the complete data. At the same time, a semi-trusted hardware module is introduced to handle distance calculation, seeing only the transformed vectors and returning the index. The tag is then restored using a one-hot secret share method, and the server reconstructs the tag through secret sharing and returns it to the user. Through the above structural design, this solution achieves privacy protection for database vectors, user query vectors, and returned tags throughout the entire process while ensuring computational efficiency, thereby effectively overcoming the drawbacks of traditional privacy protection methods such as low efficiency, strong trust assumptions, and weak index support.
[0018] Compared with homomorphic and other schemes, this scheme performs a linear transformation on all database vectors during the storage phase. This way, when entering the query phase, the transformed spatial structure still retains the similarity / distance relationship, thus enabling the construction of traditional approximate index structures (such as HNSW, IVF / PQ, etc.) on the transformed space, thereby improving query efficiency.
[0019] Semi-trusted hardware is introduced to handle distance calculations instead of performing them entirely in the cryptographic domain (such as pure homomorphic encryption), thus avoiding the severe computational overhead of pure cryptographic methods.
[0020] Meanwhile, the secret-sharing mechanism ensures that each server holds only a secret share and does not bear full responsibility for computation, enabling the system to process in parallel and in a distributed manner, thus handling large-scale databases and high-dimensional scenarios.
[0021] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application. Attached Figure Description
[0022] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0023] Figure 1 This is a schematic diagram illustrating the query process of a privacy-preserving vector database query system according to an exemplary embodiment.
[0024] Figure 2 This is a flowchart illustrating a privacy-preserving vector database query method (applied to any server) according to an exemplary embodiment.
[0025] Figure 3This is a block diagram illustrating a privacy-preserving vector database query apparatus (applied to any server) according to an exemplary embodiment.
[0026] Figure 4 This is a schematic diagram of an electronic device according to an exemplary embodiment. Detailed Implementation
[0027] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application.
[0028] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The singular forms “a,” “the,” and “the” used in this application and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any or all possible combinations of one or more of the associated listed items.
[0029] It should be understood that although the terms first, second, third, etc., may be used in this application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this application, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Depending on the context, the word "if" as used herein may be interpreted as "when," "when," or "in response to determination."
[0030] In this invention, all vectors and their corresponding labels in the vector database are encrypted using secret sharing technology and distributed across multiple (three or more) servers. Users query by inputting a query vector, and the system returns the most similar vector and its corresponding label. To accelerate the query process, the system introduces semi-trusted hardware (such as TPM / HSM / Trusted Execution Environment (TEE)) deployed on any one of these servers. Specifically, assume the database contains n database vectors, each vector being a d-dimensional, p-sized ring element. In addition to the vector, each vector also corresponds to a label. In this embodiment, the server side consists of three servers. Components, Select Server To deploy semi-trusted hardware. (User) Please submit a query vector. The server returns the closest match to the database. The vectors and their labels are defined while ensuring privacy protection.
[0031] Specifically, this application provides a privacy-preserving vector database query system, which includes a database end, a user end, and a server end. The database end is used to generate secret shares for the held vector data and corresponding tags, perform linear transformation on the secret shares of the vector data, and share them with the server end. After the data is secretly shared, it no longer participates in the specific query process. The user end is used to secretly share the user's query vector with the server end. The server end includes three servers, one of which is equipped with semi-trusted hardware.
[0032] Before initiating the query process, each server first needs to acquire its own secret share and perform a permutation. Specifically, the database will hold each vector... and its corresponding tags Perform a secret share. Use the copy secret share function to share each... In a greater than ring The upper part is , Each server hold and Similarly, the tags To share in secret.
[0033] The database also selects a random constant. (Non-zero scalar) (Offset) and a random permutation These transformations (scaling, offset, displacement) are held jointly by the servers and are not publicly disclosed.
[0034] The database shards each vector. Perform the transformation: Then it is secretly shared with the server, and the server determines the replacement. Reorder the secret sharing: that is, the actual number stored on the server... The bar represents the secret share of the original vector data. The transformed version is denoted as .vector The secret share is stored on each server. (Tag) It is also handled using a permutation and secret share method. In this way, although the server stores the secret share, it cannot restore the original vector.
[0035] Based on the above method, the server can store the secret share of the transformation vector data and the corresponding secret share of the tag. The specific query process is as follows: S1: The user will query the vector The secret is shared among the servers, and each server receives a secret share of the query vector. Perform a linear transformation to obtain the secret share of the transformed query vector: S2: Each server will change the secret share of the vector data. and transformation query vector secret share The message is sent to semi-trusted hardware (a TEE module located within server S1), which can then open the message based on the received secret share. and This allows for nearest neighbor search, obtaining the nearest neighbor vector and its corresponding index, and then secretly sharing the nearest neighbor vector and the one-hot vector corresponding to the index with each server. The nearest neighbor search process is as follows: TEE calculates the query vector With each Distance (e.g., Euclidean distance) The distance remains consistent after scaling is performed; TEE finds the nearest neighbor vectors corresponding to the minimum distance (or maximum similarity). and corresponding index ; TEE does not directly broadcast the vector content; it only outputs the index. The corresponding one-hot vector: Then, the one-hot vector and the nearest neighbor vector are secretly shared with each server. .
[0036] S3: Each server obtains the one-hot vector secret share and the nearest neighbor vector secret share sent by the semi-trusted hardware, performs a dot product calculation on the one-hot vector secret share and the tag secret share to obtain the query result tag secret share, performs the inverse transformation of the linear transformation on the nearest neighbor vector secret share to obtain the query result vector secret share, and sends the query result tag secret share and the query result vector secret share to the user terminal. Each server Upon receiving a one-hot vector secret share (or distributed by the TEE using a replicated secret sharing method), it holds a tag secret share. Based on this, each server locally calculates the dot product of its secret share label and the one-hot vector secret share: The specific process is as follows: hold , , , , Local computing ,Then Will Send to the user.
[0037] TEE also includes the nearest neighbor vector The secret is shared among the servers, which then calculate the secret share of the query result vector using the inverse of the linear transformation described above. Ultimately, each server will share its query result vector secret. Send to the user.
[0038] S4: The user end reconstructs the query result vector and query result label based on the received query result label secret share and query result vector secret share; Because of the use of copy secret sharing, users calculate the tags of query results. The same applies to the query result vector. .
[0039] Based on the above system, this application provides a privacy-preserving vector database query method, applicable to any server, wherein the server stores a secret share of transformed vector data and a corresponding secret share of tags secretly shared by the database side, such as... Figure 2 As shown, the method includes: S11: Obtain the query vector secret share, and perform a linear transformation on the query vector secret share to obtain the transformed query vector secret share; S12: Send the secret share of the transformation vector data and the secret share of the transformation query vector to the semi-trusted hardware so that the semi-trusted hardware can perform nearest neighbor search, obtain the nearest neighbor vector and the corresponding index, and secretly share the one-hot vector corresponding to the nearest neighbor vector and the index with each server. S13: Obtain the one-hot vector secret share and the nearest neighbor vector secret share sent by the semi-trusted hardware; perform a dot product calculation on the one-hot vector secret share and the label secret share to obtain the query result label secret share; perform the inverse transformation of the linear transformation on the nearest neighbor vector secret share to obtain the query result vector secret share; and send the query result label secret share and the query result vector secret share to the user terminal so that the user terminal can reconstruct the query result vector and the query result label.
[0040] Corresponding to the aforementioned embodiments of the privacy-preserving vector database query method, this application also provides embodiments of the privacy-preserving vector database query apparatus.
[0041] Figure 3 This is a block diagram illustrating a privacy-preserving vector database query apparatus according to an exemplary embodiment. (Refer to...) Figure 3 This device, applicable to any server, may include: Linear transformation module 21 is used to obtain the query vector secret share and perform a linear transformation on the query vector secret share to obtain the transformed query vector secret share; The sending module 22 is used to send the secret share of the transformation vector data and the secret share of the transformation query vector to the semi-trusted hardware, so that the semi-trusted hardware can perform nearest neighbor search, obtain the index corresponding to the nearest neighbor vector, and secretly share the nearest neighbor vector and the one-hot vector corresponding to the index with each server. The query result generation module 23 is used to obtain the one-hot vector secret share and the nearest neighbor vector sent by the semi-trusted hardware, calculate the dot product of the one-hot vector secret share and the label secret share to obtain the query result label secret share, perform the inverse transformation of the linear transformation on the nearest neighbor vector to obtain the query result vector secret share, and send the query result label secret share and the query result vector secret share to the user terminal so that the user terminal can reconstruct the query result vector and the query result label.
[0042] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.
[0043] For the device embodiments, since they basically correspond to the method embodiments, the relevant parts can be referred to in the description of the method embodiments. The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this application according to actual needs. Those skilled in the art can understand and implement this without creative effort.
[0044] Accordingly, this application also provides a computer program product, including a computer program / instruction that, when executed by a processor, implements the privacy-preserving vector database query method described above.
[0045] Accordingly, this application also provides an electronic device, including: one or more processors; a memory for storing one or more programs; and, when the one or more programs are executed by the one or more processors, causing the one or more processors to implement the privacy-preserving vector database query method described above. Figure 4 The diagram shown illustrates a hardware structure of any data processing-capable device, including a privacy-preserving vector database query device provided in an embodiment of the present invention. (Except for...) Figure 4 In addition to the processor, memory, and network interface shown, any data processing device in the embodiment may also include other hardware depending on the actual function of the data processing device, which will not be described in detail here.
[0046] Accordingly, this application also provides a computer-readable storage medium storing computer instructions thereon, which, when executed by a processor, implement the privacy-preserving vector database query method described above. The computer-readable storage medium can be an internal storage unit of any data-processing device as described in any of the foregoing embodiments, such as a hard disk or memory. The computer-readable storage medium can also be an external storage device, such as a plug-in hard disk, smart media card (SMC), SD card, flash card, etc., equipped on the device. Furthermore, the computer-readable storage medium can include both internal storage units of any data-processing device and external storage devices. The computer-readable storage medium is used to store the computer program and other programs and data required by the data-processing device, and can also be used to temporarily store data that has been output or will be output.
[0047] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein.
Claims
1. A method for privacy preserving vector database query, characterized in that, Applied to any server, where the server stores a secret share of transform vector data and a corresponding secret share of tag data secretly shared by the database, the method includes: Obtain the query vector secret share, and perform a linear transformation on the query vector secret share to obtain the transformed query vector secret share; The secret share of the transformation vector data and the secret share of the transformation query vector are sent to the semi-trusted hardware so that the semi-trusted hardware can perform nearest neighbor search, obtain the nearest neighbor vector and the corresponding index, and secretly share the nearest neighbor vector and the one-hot vector corresponding to the index with each server. The one-hot vector secret share and the nearest neighbor vector secret share sent by the semi-trusted hardware are obtained. The one-hot vector secret share and the tag secret share are calculated by dot product to obtain the query result tag secret share. The nearest neighbor vector secret share is subjected to the inverse transformation of the linear transformation to obtain the query result vector secret share. The query result tag secret share and the query result vector secret share are sent to the user terminal so that the user terminal can reconstruct the query result vector and the query result tag.
2. The method of claim 1, wherein, The transformation vector data secret share and the corresponding tag secret share are obtained in the following way: obtaining original transformed vector data secret shares by generating secret shares from vector data held by the database end pair and linearly transforming the vector data, and original label secret shares from corresponding label secret shares held by the database end pair, wherein the original transformed vector data secret shares wherein vector data held by the database end pair generated secret shares, is a non-zero scalar, is an offset; The original transformed vector data secret share and the original label secret share are reordered using the same permutation to obtain the transformed vector data secret share and the corresponding label secret share.
3. The method according to claim 1, characterized in that, The transformation query vector secret share ,in Query vector held by the server The secret share, It is a non-zero scalar. This is the offset.
4. The method according to claim 1, characterized in that, In the one-hot vector, the index corresponding to the nearest neighbor vector is set to 1, and the values at the other positions are 0. ,in This is the index of the nearest neighbor vector.
5. The method according to claim 1, characterized in that, The secret share of the query result tag is: , Where n is the number of tags. This is the secret share of the reordered tags. This is the secret share of the one-hot vector.
6. The method according to claim 1, characterized in that, The secret share of the query result vector is: , in, Nearest neighbor vector The secret share, It is a non-zero scalar. This is the offset.
7. A privacy-preserving vector database query system, characterized in that, include: On the database side, the database side is used to generate secret shares for the held vector data and corresponding tags, perform linear transformation on the secret shares of the vector data, and share them with the server side. The user terminal is used to secretly share the user's query vector with the server. The server side consists of three servers, one of which has semi-trusted hardware deployed on it. Each server is used to obtain the transformed vector data secret share and the corresponding label secret share, and to reorder the transformed vector data secret share and the corresponding label secret share using the same permutation; the query vector secret share undergoes the same linear transformation as the transformed vector secret share to obtain the transformed query vector secret share; the transformed vector data secret share and the transformed query vector secret share are sent to semi-trusted hardware to obtain the one-hot vector secret share and the nearest neighbor vector secret share sent by the semi-trusted hardware; the one-hot vector secret share and the label secret share are calculated by dot product to obtain the query result label secret share; the nearest neighbor vector secret share is subjected to the inverse transformation of the linear transformation to obtain the query result vector secret share; the query result label secret share and the query result vector secret share are sent to the user terminal so that the user terminal can reconstruct the query result vector and the query result label; Semi-trusted hardware is used to perform nearest neighbor search, obtain the nearest neighbor vector and its corresponding index, and secretly share the nearest neighbor vector and the one-hot vector corresponding to the index with each server.
8. A computer program product comprising a computer program / instructions, characterized in that, When the computer program / instruction is executed by the processor, it implements the method as described in any one of claims 1-6.
9. An electronic device, characterized in that, include: One or more processors; Memory, used to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1-6.
10. A computer-readable storage medium storing computer instructions thereon, characterized in that, When executed by the processor, this instruction implements the steps of the method as described in any one of claims 1-6.