Secure multi-party computation of high-frequency hits in differential privacy

By using differential privacy-preserving multi-party computation protocols HH1 and HH2, a counting table is generated and noise is added, which solves the privacy and efficiency problems in determining the top k values ​​in multi-party computation, and realizes efficient and accurate calculation of the top k values ​​without sharing data.

CN115525909BActive Publication Date: 2026-06-30SAP SE

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SAP SE
Filing Date
2021-11-30
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In multi-party computation, how to determine the top k values ​​without sharing sensitive data while ensuring data privacy and computational efficiency, especially when conducting data analysis between data protection regulations and distrustful parties, presents challenges due to the inefficiency and insufficient privacy protection of existing technologies.

Method used

By employing differential privacy-preserving multi-party computation (MPC) protocols HH1 and HH2, and by generating a counting table, adding noise, and performing thresholding, the system ensures that the top k output values ​​are distributed and computed under differential privacy protection, utilizing trusted servers or cloud computing nodes for computation and output.

Benefits of technology

It enables efficient and accurate determination of the top k values ​​without compromising the privacy of participating parties' data. It is suitable for both small and large datasets, provides strong privacy guarantees and high accuracy, and is applicable to multi-party computing environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115525909B_ABST
    Figure CN115525909B_ABST
Patent Text Reader

Abstract

According to one aspect, a method for secure multi-party computation of high-frequency hits in differential privacy may include: receiving candidate values; incrementing the corresponding count in response to a received candidate value matching an entry in a table; adding an entry to the table in response to a received candidate value not matching an entry in the table and the table not exceeding a threshold size; decrementing the count in the table and deleting entries with a count of zero in response to a received candidate value not matching an entry in the table and the table exceeding a threshold size; adding noise to the corresponding counts of entries in the table and deleting any noisy corresponding counts less than a threshold; and outputting at least a portion of the table as a set of the top k values.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The inventive concept of this application provides methods, systems, and articles of manufacture for multi-party computation, including computer program products. In particular, the inventive concept of this application discloses a solution to a distributed privacy learning problem for the top-k values. Background Technology

[0002] Services used to perform analysis on sensitive data (e.g., statistical or aggregate queries) may involve sharing data with third parties. In some cases, sharing plaintext data with one or more parties may be undesirable or impractical. For example, the data may be sensitive data that is not permitted to be shared. In some cases, the parties sharing the data may be mutually distrustful. In other cases, using a trusted third party may also be impractical because that trusted third party may become suspicious. Summary of the Invention

[0003] Methods, systems, and artifacts for multi-party computation are provided, including computer program products.

[0004] According to one aspect, a system includes: at least one data processor; and at least one memory storing instructions that, when executed by the at least one processor, produce operations including: generating a table for determining the first k values ​​across a plurality of clients, including mapping candidate values ​​to entries with corresponding counts; receiving candidate values ​​from each of the plurality of clients; incrementing the corresponding count for a matching candidate value in response to a received candidate value matching an entry in the table; adding an entry to the table by adding the received candidate value along with a count value of 1 in response to a received candidate value not matching an entry in the table and the table not exceeding a threshold size; decrementing all counts in the table by 1 in response to a received candidate value not matching an entry in the table and the table exceeding the threshold size, and deleting any entry with a count of zero from the table; adding noise to the corresponding counts in the entries of the table; deleting the corresponding entry in the table for a noisy corresponding count in response to a noisy corresponding count being less than a threshold; and outputting at least a portion of the table as a result set of the first k values.

[0005] In some variations, one or more of the features disclosed herein, including the features described below, may be optionally included in any feasible combination. The table may be sorted based on noisy correspondence counts before output. The system may include or be contained within a trusted server. The set of the top k values ​​is determined based on multi-party computation using a data domain spanning multiple clients, wherein the set of the top k values ​​is determined within the scope of the data domain. The system may utilize at least one compute node at a cloud provider or at least one compute node at one or more of the clients to perform the multi-party computation. Receiving candidate values ​​from each of the multiple clients may include receiving candidate values ​​within a secure message, which further includes a portion of the noise value. Adding noise to the correspondence counts in the table entries may further include adding noise based on a portion of the noise value from each of the multiple clients. Outputting at least a portion of the table as the set of the top k values ​​may further include outputting at least a portion of the table in a secure message. The set of the top k values ​​may be output according to differential privacy.

[0006] It should be understood that the general description above and the detailed description below are exemplary and illustrative, not restrictive. Additional features and / or variations may be provided beyond those set forth herein. For example, the embodiments described herein may relate to various combinations and sub-combinations of the disclosed features and / or combinations and sub-combinations of several other features disclosed in the detailed description below. Attached Figure Description

[0007] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate certain aspects of the subject matter disclosed herein, and together with the textual description, help to explain certain principles associated with the disclosed embodiments. In the drawings:

[0008] Figure 1A A conceptual description is provided of a mechanism for adding noise to enhance differential privacy, based on some example embodiments.

[0009] Figure 1B Examples of models that can implement differential privacy algorithms according to some example embodiments are described;

[0010] Figure 2A -F describes an example of the process of determining the first algorithm HH1, or the value of the top k values ​​or high-frequency hits, according to some example embodiments;

[0011] Figure 3 An example process for a second algorithm HH2, which determines the top k values ​​or high-frequency hit values ​​according to some example embodiments, is described; and

[0012] Figure 4 A block diagram illustrating a computing system consistent with the implementation methods of the present topic is shown.

[0013] In the accompanying drawings, similar reference numerals are used to denote the same or similar items. Detailed Implementation

[0014] Data collection is a primary function of many entities worldwide. For example, some entities offer free services (such as internet search or social networks) and then monetize the data collection from end-user data of these free services. However, under data protection regulations in some jurisdictions (e.g., the EU General Data Protection Regulation (GDPR)), allowing unrestricted and comprehensive data collection that uniquely identifies the end-user can raise ethical and / or legal issues. Specialized privacy-preserving data collection can mitigate some of these privacy concerns associated with data collection. For this reason, differential privacy (DP) can be used to provide strong privacy guarantees. Furthermore, secure multi-party computation (MPC) can be combined with differential privacy. The additional use of secure MPC can improve accuracy without compromising privacy. Secure MPC is a cryptographic tool that allows multiple parties to evaluate a function on data distributed among themselves, but only discloses or shares the results of the function among themselves (in other words, not sharing the input data among themselves). However, it is generally considered that the secure computation of differential privacy mechanisms is inefficient, with high communication and computational overhead.

[0015] This paper discloses a solution to a distributed privacy learning problem for the top k values, such as the k most common values ​​(also known as the top k "heavy hitters"). The term "k" refers to how many values ​​(e.g., the first value (k=1), the second value (k=2), etc.) are in the result set. For example, multiple distributed users can determine (e.g., compute) the top k values ​​with high accuracy and strong privacy guarantees without resorting to a trusted third party to store and share user private data used for computation. To this end, in some embodiments, secure multi-party computation (MPC) for differential privacy (DP) of the top k values ​​is provided.

[0016] In some embodiments, a first protocol HH1 and a second protocol HH2 are provided. These protocols securely compute the first k values ​​in a differential privacy manner without disclosing the privacy information of the participants during computation, while providing differential privacy protection for the computation output. Furthermore, even for small datasets (e.g., a few users), protocols HH1 and HH2 can be considered highly accurate (which is a challenging regime for differential privacy) and / or can be considered to have practically feasible runtime (e.g., efficient optimized computation implementation).

[0017] In the following text, for example, This refers to the so-called "ideal" functionality operated by a trusted third party, and the HH1 protocol example could therefore refer to F. HH1 The example MPC implementation replaces the trusted third party with a cryptographic protocol. HH1 can combine high-frequency hit detection (e.g., the top k values) with differential privacy (DP) bounded count release (e.g., releasing values ​​where the noisy count exceeds a threshold). Furthermore, a method for using HH1 algorithms (e.g., see Tables 5 and 7 below) is also provided. An example of a secure implementation method for calculating efficiency. Used for a second protocol. The function combines distributed high-frequency hit detection with centralized differential privacy high-frequency hit detection. Furthermore, it provides a method for using the HH2 algorithm (see, for example, Tables 8 and 9 below). A safe implementation method for calculating efficiency.

[0018] The use of differential privacy top k value discovery can be applied in a wide variety of environments. For example, user behavior data mining can be performed across multiple parties (e.g., multiple users) in a differential privacy manner. User behavior mining can include identifying, for example, frequently typed words on client devices (e.g., which can be used to improve autocomplete suggestions), detecting user choices or settings for a given application, and so on. To further illustrate, differential privacy telemetry data collection can be deployed to client devices to allow responses to queries, such as what the top k items are among users on those client devices. End users can perform secure, privacy-preserving analysis on their combined data without disclosing any of their own data to any other person (due to the secure computation of the disclosed efficiency). Furthermore, results for queries targeting the top k values ​​can be obtained without sharing data with trusted third parties. To further illustrate, queries such as the top k most visited applications or the top k most searched products (or products viewed, purchased, or returned) can be responded to using the secure MPC DP protocols HH1 and HH2 disclosed herein without infringing on the privacy of any individual user's privacy data. The privacy-preserving protocols for the top k values ​​disclosed in this paper can also be used to collect information not only from end users of a single entity (e.g., end users of a company), but also across different entities (e.g., different companies that would not normally share privacy data) and their corresponding end users, while providing strong privacy and security guarantees between entities (and / or end users). For example, information from different companies (without sharing privacy information from any of these companies) can be computed to provide, for example, comprehensive insights into an industry sector.

[0019] In providing information about the HH1 and HH2 protocols (and corresponding...) and Before providing additional details on the implementation methods, a description of differential privacy and secure multi-party computation will be provided below.

[0020] In some of the examples disclosed herein, a participant or user can refer to a client machine (or device), such as a computer, an Internet of Things (IoT) device, and / or other processor-based machine. Given, for example, a set of n participants, this can be represented as follows:

[0021]

[0022] Wherein, each P i It must have at least a single data value d i Where i varies from 1 to n, and D represents each party. The combined dataset is modeled as D = {d1,...,d}. n}, where d1, d2...d n It is the data value (or more simply "data") of the data domain U (for example, the data universe U represents the set of possible data values, such as the set of all integers, the set of integers, etc.).

[0023] Secure multi-party computation (also known as multi-party computation MPC) can enable the aggregation of parties. Jointly compute a function, such as median, mode, top-k, or other types of functions or operations, without requiring each party to share its dataset with the others. For example, in MPC, each party can participate by providing secure input messages or exchanging secure input messages with other parties (e.g., secret sharing), allowing the parties to compute these messages jointly to compute the function. The final output will reveal the final encrypted output, which can be decrypted by each party (e.g., using secret sharing) to reconstruct or disclose the result without requiring each party to disclose its private data. Secret sharing refers to distributing a secret among a group of parties, with each party receiving a share of the secret, so that the secret can only be reconstructed when a sufficient number of shares are combined. To illustrate secret sharing, Shamir Secret Sharing (SSS) can be used to ensure the security of a secret in a distributed manner (e.g., dividing the secret into multiple parts or shares that are used to reconstruct the original secret).

[0024] Regarding the HH1 and HH2 protocols (and their corresponding protocols) and For example, the function being computed is the first k values ​​from each party's dataset. Let each party P... i Preserve sensitive input data d iThen the sets of all parties maintain the input d1……d n While maintaining privacy, it can jointly compute the first k functions y1,...,y k =f(d1,….,d n The output of this secure multi-party computation must be correct and secret; in other words, the first k output values ​​y1,...,y must be computed. k The correct value, and maintain the input data d1……d between all parties. n The confidentiality of the output is maintained, thus disclosing it only to the parties involved.

[0025] Secure multi-party computation can be implemented using different trust assumption models. In a semi-honest model (or passive), the parties (also known as adversaries) do not violate the protocol but collect everything created during the protocol's execution. However, in a malicious model (or active), the parties may violate the protocol (e.g., alter messages).

[0026] As noted, differential privacy (DP) provides strong privacy guarantees by restricting what can be provided as output. For example, the effect on the output can be limited or restricted when a single data value in the input dataset changes, thus maintaining privacy. If an algorithm is differentially private, then an observer seeing the output of the algorithm will not be able to discern the input data values ​​used to compute that output. Some form of randomization is a key aspect of differential privacy that allows it to hide and maintain the privacy of the input data of the participants. Mathematically or formally, differential privacy can be defined as shown in Table 1 below, although the less formal definition of differential privacy can satisfy the input data privacy required for differential privacy. Although the definition provided in Table 1 holds for unbounded adversaries, in cryptographic terms, it also holds for adversaries with bounded computational power. In Table 1, (e,0)-DP can refer to pure DP, and approximate DP allows for an additional additive privacy loss d > 0. Typically, d is negligible in the size of the data. Although pure DP mechanisms are introduced, protocols apply them in combination with d-based thresholds and thus satisfy approximate DP.

[0027] Table 1

[0028]

[0029] Randomness can be provided by adding noise (one aspect of achieving differential privacy), thereby allowing individual data to be hidden or obfuscated. For example, noise can be added to the output of a function to provide a degree of differential privacy. One way to add noise can take the form of a Laplace mechanism. In a Laplace mechanism, the added noise is selected from a Laplace distribution. Mathematically or formally, the Laplace mechanism can be defined as shown in Table 2 below.

[0030] Table 2

[0031]

[0032] An alternative to additive noise is to use probabilistic output selection via an exponential mechanism. In the case of an exponential mechanism, the exponential mechanism (EM) calculates the selection probability based on an exponentially weighted utility score. The exponential mechanism extends the application of differential privacy to functions with non-numerical outputs, or when the output is not robust to additive noise, such as in the case of median functions. The exponential mechanism is more likely to select a "good" outcome exponentially, where "good" is quantified via a utility function u(D,r) that takes a value from a database D∈U. n As input, and taking a fixed set R of potential outputs r∈R from arbitrary outputs. Informally, the exponential mechanism outputs elements with probabilities proportional to the following: Furthermore, higher utility means that the output is more desirable and its selection probability increases accordingly.

[0033] Figure 1A Examples of selection probabilities calculated using an exponential mechanism are shown according to some example embodiments. Mathematically or formally, the exponential mechanism can be illustrated as shown in Table 3 below, although a less formal definition of an exponential function can also be used. In the examples described herein, the exponential mechanism (EM) is denoted as... (Although in some of the examples described herein, excluding those shown in Table 3 below) In some cases, the symbols u and / or ε may be omitted. In Table 3, It refers to the set of potential output values.

[0034] Table 3

[0035]

[0036] The argmax of a utility score with additive noise from a Gumbel distribution is equivalent to an exponential mechanism. A Gumbel mechanism with an output distribution identical or similar to the exponential mechanism adds Gumbel-distributed noise to the utility score and selects the output with the highest noisy score (the argmax of the noisy utility score). In a formal or mathematical sense, the Gumbel mechanism M... G The definition can be shown in Table 4 below.

[0037] Table 4

[0038]

[0039] Noise such as Laplace noise, exponential noise, and / or Gumbel noise can be used in differential privacy mechanisms. These types of noise can be generated by multiple parties. For example, each party can provide a portion of the noise, which is then combined with noise from the other parties to form the desired noise for differential privacy. Laplace(b) can be expressed as the sum of n partial noise values:

[0040]

[0041] in, It comes from the gamma distribution Gamma The sample, and the gamma distribution with shape k = 1 / n and scale b, has the following density:

[0042]

[0043] Gumbel(b) can be expressed as follows:

[0044]

[0045] Among them, Y j It is sampled from the exponential distribution Expon(1), and the exponential distribution with scale b has the following density:

[0046]

[0047] For x>0, it is 0 in all other cases.

[0048] Figure 1B Examples of differential privacy implementation models M are shown, such as centralized model 101A, localized model 101B, and shuffling model 101C. In centralized model 101A, parties 110A and 110N (shown as client devices “C1”...“C”) N Each party sends its unprotected data to a trusted central server 112, which then runs a differential privacy algorithm on the clean data. The centralized model offers the highest accuracy because the randomization inherent in the differential privacy algorithm is applied only once at the trusted central server 112. Figure 1B In the example, differential privacy algorithm This combines a function (e.g., a function or operator used to compute the first k values ​​or to be computed via MPC) with a randomization process. In this example, It is a differential privacy mechanism that calculates a utility score for the function (which is being computed via MPC) and makes a probabilistic choice of the output based on the calculated utility score (where a higher integral translates to a higher choice probability).

[0049] In localization model 101B, parties 120A and 120N (represented as client devices "C1"..."C") N Local application differential privacy algorithm The anonymized values ​​121A-121B are then sent to the untrusted server 122 for aggregation. Regarding the localized model 101B, the accuracy of output 125 is limited because randomization is applied multiple times. Therefore, the localized model 101B may require a relatively larger number of users to achieve accuracy comparable to the centralized model.

[0050] In the intermediate shuffling model 101C, the shuffling procedure 130 is a trusted party added between the parties 120A-N and the server 122. The shuffling procedure does not collude with any of the parties 120A-N. The shuffling procedure alters the order of randomized client values ​​132A-B and forwards them. This order change breaks the mapping between the client and its value, reducing the randomization requirement. The accuracy of the shuffling model 101C can fall between that of the localized model 101A and the centralized model 101B; however, in general, the shuffling model 101C is significantly weaker than the centralized model 101B. The centralized MPC model 101A generally incurs high computational and communication overhead (which reduces efficiency and scalability to a larger number of clients / participants). The centralized MPC model 101A can offer several advantages over other models, such as higher accuracy and stronger privacy (e.g., values ​​are not exposed to third parties).

[0051] The following provides an explanation of the first protocol F. HH1 The description includes an example MPC implementation HH1 using a first protocol of MPC, and will also provide an explanation of a second protocol F. HH2 The description includes F HH2 Example implementation HH2. Although some of the examples involve the use of trusted third-party servers, the examples described herein can also be implemented using secure MPC.

[0052] Figures 2A-2F A first protocol for determining the top k values ​​or high-frequency hit values, according to some example embodiments, is shown. Examples. To simplify the explanation, the initial focus will be on F using a trusted party. HH1 Let me explain this example.

[0053] exist Figures 2A-2FIn the example, the set of parties 202A - E includes parties P1, P2, P3, P4, and P5. The set of parties represents the set of parties whose first k items are determined. In this example, party P1 holds the value 4, party P2 holds the value 4, party P3 holds the value 5, party P4 holds the value 3, and party P5 holds the value 5. The determination process among the parties can be triggered by a request from any one of the parties (P1 - P5) or another party (e.g., other parties can also request the first k items among the parties).

[0054] The trusted server represented by the computing server 210 creates a table 266A that has entries mapping data values within a data value range (labeled "value") to corresponding counts. For example, the computing server 210 creates table 266A such that each data entry includes a data value mapped to a count (e.g., see 1 in Table 5 below). In this example, table 266A includes a single value 4 mapped to a count of 1 because only a single value 4 was received from the first party P1 at the client device 202A in 220A. In other words, d is an element of table T (d ∈ T), and thus the count value is incremented to 1 (e.g., T[d]=1).

[0055] In Figure 2B it, the computing server 210 receives a second data in 220B, e.g., the data value 4 from the second party P2 at the client device 202B. Since the value 4 is an element of table T 266B (e.g., d ∈ T), the counter value is incremented to 2 (e.g., T[d]=2), as shown at table 266B (e.g., refer to 2a in Table 5 below).

[0056] In Figure 2C at, the computing server 210 receives a third data 5 from the third party P3 at the client device 202C in 220C. Since the value 5 is not an element of table T 266C and the table T is not full, an entry is added to table 266C to include the value 5, and the corresponding count increments the zero count to count 1 (e.g., see 2b in Table 5 below: if │T│ < t, then add d to T and set T[d] to 1).

[0057] In Figure 2DIn the above, computing server 210 receives the fourth data 3 from the fourth party P4 at client 202D in 220D. Since the value "3" is not an element of table T 266D, and the table is full (in this example, the table size is 2), all counters in 266D are decremented by 1, and then all values ​​with a count of 0 are removed, so that only the value 4 remains in table 266E with a count of 1 (see, for example: otherwise, all counters T[i] are decremented, and i is removed from T if T[i] = 0; see, for example, at 2c in Table 5 below).

[0058] exist Figure 2E In the above, computing server 210 receives fifth data 5 from fifth party P5 at client device 202E at 220E. Since value 5 is not an element of table T 266F and table T is not full, value 5 is added and counter is incremented by 1 (e.g., see: if d∈T, then the counter is incremented T[d]=1; see 2b in table 5 below).

[0059] exist Figures 2A-2E In this example, computation server 210 generates a fixed-size table T (size 2 in this example) with at most t entries (size t), mapping values ​​from each party to counts. Since each value is received from each party, the computation server processes these values, incrementing the count if a value matches an entry in the table. However, if a value does not match an entry in the table, and the table is not full, the received value is added as an entry to table T, and the corresponding counter for that value is incremented by 1. However, if the value does not match an entry in the table, and the table is full, the computation server decrements all counters by 1 and then removes all values ​​with a count equal to 0. After processing the values ​​from each party 202A-202E, computation server 210 can output the first k values ​​based on the contents of table T (e.g., table 266F). In this example, the first k values ​​may correspond to the first two entries of table 266F, for example, values ​​4 and 5.

[0060] In some embodiments, noise can be added to the counts, and values ​​whose noisy counts are below a threshold can be removed. For example, the computation server can add noise to the counts in a table (e.g., Table 266F) before providing the output of the first k values. Furthermore, noisy counts below a threshold can be removed. This threshold setting helps ensure that values ​​(provided by multiple participants rather than a single participant) contribute to privacy protection, as we need a sufficient number of participants to allow small changes (adding / removing a single value) to not alter the outcome with a high probability (while multiple changes might change the outcome). In other words, individual contributions are protected and not necessarily released, but the aggregated contributions of many individuals (with additive noise) are released as output (with a high probability).

[0061] In the case of additive noise and a trusted third party according to some example embodiments, F HH1 This will further include (such as) Figure 2F As shown in Table 266G, noise (such as Laplace distribution noise) is added to the count value by the calculation server 210. Table 266G corresponds to Table 266F after noise has been added to the count value (represented by N(1+noise), where 1 is the count, noise represents the added noise, and N is a function of the added noise). If the noisy count does not exceed a threshold, then that value and count are removed from Table 266F. In 270, the trusted server checks whether the noisy count exceeds the threshold. For example, if the noisy count does not exceed the threshold, then the noisy count is removed from the table in 270. Figure 2F In the example, both counts exceed the threshold, so both noisy counts and values ​​are retained in table 270. After the noisy thresholding operation in 270, the trusted server, acting as computation server 210 in this example, releases table 272 as output to provide the first k values, which in this example are the first two.

[0062] Table 5 provides Fi for obtaining the top k values ​​using additive noise and a trusted third party, according to some example embodiments. HH1 Example implementation. In the example of Table 5, at row 3(a), noise is added to the count as noted above. At row 3(b), the value i is removed from table T unless it exceeds the threshold τ. Furthermore, at row 4, the remaining values ​​in the table are sorted according to their noise count and then released as output. Reference Figure 2F The first k values ​​can be sorted based on noise counts and then released (e.g., the most common high-frequency hits will be ranked first in the sorted output).

[0063] value count 4 1+ Noise 5 1+ Noise

[0064] Table 5

[0065]

[0066]

[0067] In Table 5 (and Table 7 below), the symbol Δ represents the maximum number of counts that an individual participant can influence, and thus (for example) Δ = 1 (e.g., when we query country of origin) or Δ > 1 (For example, when there is a query for current and previous employers).

[0068] Table 6

[0069] MPC Protocol Output / Functionality EQ( , ) <1> If a = b, otherwise <0> THE( , >) <1> If a ≤ b, otherwise <0> ADD( ,<b〉) <a+b〉 AND( , ) <a·b> NOT( ) <1-a> CondSwap( , , <c> ) < / c> If bit c = 1, otherwise Rec( ) Reconstructing the secret a

[0070] Regarding the implementation method that does not use a trusted third-party server but instead uses MPC, the F described above can be considered... HH1 The algorithm is implemented as the HH1 MPC algorithm described in Table 7 below. In other words, this MPC HH1 algorithm is similar to... Similarly, HH1 uses MPC instead of a trusted server, and each party provides encrypted input, including values ​​and some noise values ​​(e.g., using secret sharing or other cryptographic techniques), to achieve joint computation of the first k values.

[0071] For example, parties 202A-E can perform joint computation of the first k terms using MPC (also known as secure MPC) by securely exchanging messages (e.g., secret sharing), where the secure message represents the input to the joint MPC computation of the first k terms. For example, a secure message from party 202A might include a value (e.g., "4" in the example of 220A) and some noise. The parties compute the secure input messages to obtain a secure final output, such as the first k terms. This secure final output is encrypted (secretly shared) and can be decrypted by each party to reconstruct or disclose the result, such as the first k values. Although the parties can jointly compute this first k term function, they can outsource this MPC processing to compute nodes (e.g., multiple cloud service providers). This outsourcing allows for the distribution of trust (where there is no single fully trusted party, but multiple semi-honest parties, and the secret can only be reconstructed if the majority of them are the places where collaboration will take place or where an attack / hacking is imminent). To this end, the parties secretly share their inputs with the computing party, and the computing party performs the calculation of HH1.

[0072] For the MPC used for the first k terms, parties 202A-E can provide each other with input messages including values ​​and partial noise, wherein the input messages are encrypted via secret sharing. For example, party 202A can provide an encrypted message containing "4" and a partial noise value to the other parties 202B-E. This partial noise is added to the count value, as described above. Figure 2F As indicated. Alternatively or additionally, MPC computations may be outsourced to (multiple) compute nodes (e.g., the cloud service mentioned above), in which case the parties will send input messages to the cloud nodes for MPC computation. To improve the computational efficiency of MPC, the operations performed may include operations primarily of addition, although Table 6 lists some examples of the operations used.

[0073] Table 7 lists the MPC operations (or sub-protocols) from Table 6. The outputs of these sub-protocols are encrypted (e.g., secret-shared), except for Rec(·), which reconstructs the output using secret-shared (“decryption”). Protected values ​​are enclosed in angle brackets, such as <·>, which can be considered a form of encryption (e.g., via secret-shared). Uppercase letters in Table 7 denote arrays, where, for example, A[j] denotes the j-th element in array A. Array V holds values, array C holds counts, and array N holds added noise. Using b state Indicates a Boolean value (in one-bit form) (e.g., b) match =1 indicates a match.

[0074] Table 7

[0075]

[0076]

[0077] Figure 3 A second protocol is illustrated according to some example embodiments. Examples of functions. With the first protocol F HH1 Unlike the previous example, each party 302A-D will encode (e.g., binary encoding) its values ​​in the dataset being evaluated for the first k terms. For example, the first party might encode the value A as 01000001, and each encoded value would include a prefix, such as 01 in this example. The parties can be divided into g groups such that the first group is asked whether their value begins with a prefix of a certain length, such as a predetermined length (e.g., γ+η, where...). And η is the number of bits we extend the prefix in each round; see also Table 8 below). Then, the most frequent prefix in the first group is used to query the next group, such as the second group in this example. For queries of the next group, the most frequent prefix in the first group can also be extended by η bits (e.g., length γ + 2η), and this process can be repeated until the prefix length equals the bit length b of the field (e.g., for ASCII encoding, the field length is 8 for a length of 8 per symbol).

[0078] exist Figure 3 In this example, the set of parties includes participants P1 302A, P2 302B, P3 302C, and P4 302D. The first party P1 has a binary code value of 001, the second party P2 has a binary code value of 100, the third party P3 has a binary code value of 001, and the fourth party P4 has a binary code value of 001. Parties P1-P4 are divided into groups; in this example, there are two groups, so P1 and P2 are in group 1, and P3 and P4 are in group 2.

[0079] In step 310, the computing server queries the first group and requests a count of the initial set of prefixes with a length of 2 (e.g., γ = 1, η = 1). For example, it requests a count of the initial set of prefixes (00, 01, 10, and 11). In response, the first party 302A responds in step 312A with a count vector (1,0,0,0) representing the prefix 00, which corresponds to the encoded data value 001 held by the first party. Furthermore, the second party 302B responds in step 312B with a count vector (0,0,1,0) representing the prefix 10, which corresponds to the encoded data value 100 held by the second party.

[0080] In step 314, the computation server 210 sums the reported counts element-wise. For example, the count vectors (1,0,0,0) and (0,0,1,0) are summed to obtain (1,0,1,0), which indicates that the most frequently occurring prefixes are "00" and "10". In some implementations, the computation server 210 may add noise to the counts and perform thresholding processing as described in references 325 and 330 below. If the noisy count in step 330 does not exceed the threshold, then the count corresponding to the value is removed as the top k results (e.g., see 2D in Table 8). Figure 3 In the example, the count for the prefix 001 does not exceed the threshold, so it is released as the first value in 335.

[0081] Next, the computation server expands the 2-bit prefix of the most frequently occurring prefix by a given value, such as η = 1 bit.

[0082] In step 316, the computation server 210 queries the second group of parties 302C-D, requesting a count of the prefix candidates (000, 001, 100, and 101). These prefix candidates correspond to the extended prefix of 00 (which corresponds to the 3-bit prefix 00). 0 and 00 1 ) and the extended prefix of 10 (which corresponds to the 3-bit prefix 10) 0 and 10 1 In response, third party 302C responds in 318A with a count vector (0,1,0,0) representing the prefix 001, which corresponds to the encoded data value 001 held by the third party. Furthermore, fourth party 302D responds in 318B with a count vector (0,1,0,0) representing the prefix 001, which corresponds to the encoded data value 001 held by the fourth party.

[0083] In 320, the computing server 210 adds the reported counts element by element. For example, the count vectors (0,1,0,0) and (0,1,0,0) are added together to obtain (0,2,0,0), which indicates that the most frequently occurring prefix is ​​001.

[0084] In step 325, computation server 210 adds noise to the count, such as Laplace distribution noise. For example, the computation server can add noise to aggregated counts, such as adding noise to the counts in the aggregated vector (0,2,0,0). If the noisy count in step 330 does not exceed a threshold, then the count corresponding to the value is removed as the top k results (e.g., see step 2D in Table 8). Figure 3 In the example, the count for the prefix 001 does not exceed the threshold, so it is released as the first value in 335. As noted above, noise addition and thresholding can also be performed at other times, such as during 314 and 320 above. Furthermore, although 300 depicts two iterations until the first k items are identified, there can be additional iterations querying additional groups and therefore additional rounds of noise addition, thresholding, etc., as noted above.

[0085] and Figure 3 Similarly, Table 8 below also illustrates examples based on some implementation details. An example implementation is shown. In Table 8, a table T containing entries is generated to map each prefix to a count. Table 8 also shows the formation of groups that are non-overlapping in the sense that a participant can only be a member of a single group.

[0086] Regarding the implementation method that does not use a trusted third-party server but instead uses MPC, the above-mentioned connection can be considered. Figure 3 and the description in Table 8 The algorithm is implemented as the HH2 MPC algorithm described in Table 9 below. In other words, the MPCHH2 algorithm is similar to... Similarly, but HH2 uses MPC instead of a trusted server, and each party provides secure input including values ​​and some noise values ​​(e.g., encrypted input). To perform joint computation of the first k terms using MPC, each of parties 302A-D can provide a secure or encrypted message (e.g., using secret sharing). For example, the message may include response values ​​(e.g., 312A, 312B, etc.) and some noise values. Parties can perform MPC between themselves by securely exchanging messages with other parties. Thus, parties can manipulate the secure messages to obtain a final secure output (e.g., using secret sharing) that can be decrypted by each party to reconstruct or disclose the result, such as the first k values. However, as noted, parties can outsource this MPC processing to a so-called computational party.

[0087] In MPC, each party can respond with an answer plus a partial noise value. For example, for the first party at 312A, its response is (1 + partial noise 1, 0 + partial noise 2, 0 + partial noise 3, 0 + partial noise 4). For example, in 314, the vector sum of the first and second parties across the first group produces (1 + complete noise 1, 0 + complete noise 2, 1 + complete noise 3, 0 + complete noise 4). Moreover, these responses (and other information exchanges and the result output of 335) can be encrypted using secret sharing between the corresponding parties.

[0088] Table 8

[0089]

[0090] Although a variety of noise mechanisms can be used, the Gumbel mechanism for noise can be used, rather than the Laplace noise at 2(d)(i) in Table 8, without being limited by sensitivity Δ.

[0091] For implementations that do not use a trusted third-party server, the HH2 MPC algorithm described above can be implemented according to the illustrations and descriptions in Table 9 below. Table 9 lists the MPC operations (or sub-protocols) from Table 6, which can be implemented as described above. The sorting in row 9 can be implemented as a sorting network (based on conditional exchange), where the sorting result for C (indicating a bit with a smaller or larger value) is reused to sort i in the same way. The input and computed values ​​are scaled integers (also known as fixed-point representations), which allow for more efficient and secure implementation compared to floating-point numbers.

[0092] Table 9

[0093]

[0094] Figure 4 A block diagram illustrating a computing system 400 consistent with embodiments of the present subject is shown. For example, system 400 may be used to implement client devices and / or servers, etc.

[0095] like Figure 4 As shown, the computing system 500 may include a processor 510, a memory 520, a storage device 530, and an input / output device 540. According to embodiments of the present subject matter, the trusted execution environment may be a secure area confined within the processor 510, or it may be additional hardware and / or software components. The trusted execution environment can operate as an enclave to implement confidentiality and integrity protection for the code and data contained within it, even in untrusted environments.

[0096] Processor 510, memory 520, storage device 530, and input / output device 540 may be interconnected via system bus 550. Processor 510 is capable of processing instructions that execute within computing system 500. Such executed instructions may implement, for example, one or more components such as trusted servers and / or client devices (parties). In some embodiments of the present subject matter, processor 510 may be a single-threaded processor. Alternatively, processor 510 may be a multi-threaded processor. The processor may be a multi-core processor with multiple processors or a single-core processor. Processor 510 is capable of processing instructions stored in memory 520 and / or storage device 530 to display graphical information for a user interface via input / output device 540.

[0097] Memory 520 is a computer-readable medium that stores information within computing system 500; for example, it may be volatile or non-volatile. For example, memory 520 may store data structures representing a configuration object database. Storage device 530 can provide persistent storage for computing system 500. Storage device 530 may be a floppy disk device, hard disk device, optical disk device, magnetic tape device, or other suitable persistent storage means. Input / output device 540 provides input / output operations for computing system 500. In some embodiments of the present subject matter, input / output device 540 includes a keyboard and / or a pointing device. In various embodiments, input / output device 540 includes a display unit for displaying a graphical user interface.

[0098] According to some embodiments of the present subject matter, input / output device 540 is capable of providing input / output operations for network devices. For example, input / output device 540 may include an Ethernet port or other networking port to communicate with one or more wired and / or wireless networks (e.g., local area network (LAN), wide area network (WAN), Internet).

[0099] In some implementations of the present topic, the computing system 500 can be used to execute various interactive computer software applications (e.g., Microsoft). The software application (and / or any other type of software) can be used to organize, analyze, and / or store data in various formats (e.g., lists). Alternatively, the computing system 500 can be used to execute any type of software application. These applications can be used to perform various functions, such as planning functions (e.g., generation, management, and editing of spreadsheet documents, word processing documents, and / or any other objects), computational functionality, communication functions, etc. The application may include various plug-in functions (e.g., the SAP Integrated Business Planning plug-in for Microsoft Excel, provided by SAP SE of Walldorf, Germany, as part of SAP Business Suite), or it may be a standalone computing product and / or function. When the application is activated, the function can be used to generate a user interface provided via the input / output device 540. The user interface may be generated by the computing system 500 and presented to the user (e.g., on a computer screen monitor, etc.).

[0100] One or more aspects or features of the subject matter described herein may be implemented by digital electronic circuits, integrated circuits, specially designed ASICs, field-programmable gate arrays (FPGAs), computer hardware, firmware, software, and / or combinations thereof. These diverse aspects or features may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be dedicated or general-purpose and may be coupled to receive data and instructions from a storage system, at least one input device, and at least one output device, and to send data and instructions to the storage system, at least one input device, and at least one output device. The programmable system or computing system may include clients and servers. Clients and servers are generally geographically isolated and typically interact via a communication network. The client-server relationship is established by means of computer programs running on respective computers and having a client-server relationship with each other.

[0101] These can also be referred to as programs, software, software applications, applications, components, or code. Computer programs include machine instructions for a programmable processor and can be implemented using high-level procedural and / or object-oriented programming languages ​​and / or assembly / machine languages. As used herein, "machine-readable medium" means any computer program product, apparatus, and / or device used to provide machine instructions and / or data to a programmable processor, such as disks, optical disks, memories, and programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals. "Machine-readable signal" means any signal used to provide machine instructions and / or data to a programmable processor. Machine-readable media may store machine instructions non-transitory, for example, as non-transitory solid-state memory or magnetic hard disk drives or any equivalent storage medium does. Or, or additionally, machine-readable media may store such machine instructions transiently, for example, as processor caches or other random access memories associated with one or more physical processor cores do.

[0102] To provide interaction with the user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device for displaying information to the user, such as a cathode ray tube (CRT), liquid crystal display (LCD), or light-emitting diode (LED) monitor, and a keyboard and pointing devices such as a mouse and trackball through which the user can provide input. Other types of devices may also be used to provide interaction with the user. For example, feedback provided to the user can be any form of sensory feedback, such as visual, auditory, or tactile feedback; and input from the user can be received in any form, including acoustic, voice, or tactile input. Other possible input devices include touchscreens or other touch-sensitive devices, such as single-point or multi-point resistive or capacitive touchpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices, and associated interpretation software.

[0103] In the foregoing description and claims, phrases such as “at least one of…” or “one or more of…” may appear, followed by a connecting list of elements or features. The word “and / or” may also appear in a list of two or more elements or features. Unless otherwise implied or express in the context of use, such phrases are intended to individually refer to any element or feature listed, or to a combination of any element or feature listed with any other element or feature listed. For example, the phrases “at least one of A and B,” “one or more of A and B,” and “A and / or B” are each intended to mean “A alone, B alone, or A and B together.” A similar interpretation is also intended for lists comprising three or more items. For example, the phrases “at least one of A, B, and C,” “one or more of A, B, and C,” and “A, B, and / or C” are each intended to mean “A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B, and C together.” The use of the word “based on” in the foregoing and claims is intended to mean “at least partially based on,” making unlisted features or elements permissible as well.

[0104] The subject matter described herein can be embodied in systems, apparatuses, methods, and / or objects, depending on the intended configuration. The embodiments described above do not represent all embodiments consistent with the subject matter described herein. Rather, these embodiments are merely examples of various aspects relating to the described subject matter. Although several variations have been described in detail above, other modifications or additions are possible. Specifically, additional features and / or variations may be provided beyond those described herein. For example, the embodiments described above may involve various combinations and sub-combinations of the disclosed features and / or combinations and sub-combinations of several other features disclosed above. Furthermore, the logical flows depicted in the figures and / or described herein do not necessarily require the specific order or sequence shown to achieve the desired result. Moreover, the logical flows may include different and / or additional operations than shown without departing from the scope of this disclosure. One or more operations of the logical flows may be repeated and / or omitted without departing from the scope of this disclosure. Other embodiments may also fall within the scope of the following claims.

Claims

1. A system comprising: At least one data processor; as well as At least one memory storing instructions that, when executed by the at least one processor, produce operations including: To determine the top k values ​​across multiple clients, a table is generated that maps candidate values ​​to entries with corresponding counts. Receive candidate values ​​from each of the plurality of clients; In response to a received candidate value matching an entry in the table, the corresponding count is incremented for the matched candidate value; In response to a received candidate value not matching any of the entries in the table and the table not exceeding a threshold size, an entry is added to the table by adding the received candidate value along with a count value of 1; In response to a received candidate value not matching any of the entries in the table and the table exceeding a threshold size, all counts in the table are decremented by 1, and any entry with a count of zero is deleted from the table. Add one of Laplace noise, exponential noise, or Gumbel noise to the corresponding count in the entries of the table; In response to a noisy correspondence count being less than a threshold, the corresponding entry in the table is deleted for the noisy correspondence count. Based on multi-party computation using a data domain spanning the multiple clients and determining a set of the top k values ​​across that data domain, wherein the multi-party computation includes multiple computation nodes spanning one or more of the multiple clients, and the set of the top k values ​​is divided into multiple shares distributed across the computation nodes, each computation node exchanges secure input messages with other computation nodes, each secure input message containing one or more shares, and the exchanged secure input messages are used to jointly compute the set of the top k values ​​by combining the shares contained in the secure input messages to reconstruct the set of the top k values, while keeping privacy data confidential from other computation nodes; and Output at least a portion of the table as the set of the first k values.

2. The system according to claim 1, wherein, The table is sorted based on the noisy correspondence count before the output.

3. The system according to claim 1, wherein, The system includes or is contained within a trusted server.

4. The system according to claim 1, wherein, Receiving the candidate value from each of the plurality of clients includes receiving the candidate value within a security message, the security message further including a portion of the noise value.

5. The system according to claim 4, wherein, Adding noise to the corresponding count in the entry of the table further includes adding the noise based on the partial noise value from each of the plurality of clients.

6. The system according to claim 1, wherein, Outputting at least a portion of the table as the set of the first k values ​​further includes outputting the at least a portion of the table in a security message.

7. The system according to claim 1, wherein, The set of the first k values ​​is output based on differential privacy.

8. A computer-implemented method, comprising: To determine the top k values ​​across multiple clients, a table is generated that maps candidate values ​​to entries with corresponding counts. Receive candidate values ​​from each of the plurality of clients; In response to a received candidate value matching an entry in the table, the corresponding count is incremented for the matched candidate value; In response to a received candidate value not matching any of the entries in the table and the table not exceeding a threshold size, an entry is added to the table by adding the received candidate value along with a count value of 1; In response to a received candidate value not matching any of the entries in the table and the table exceeding a threshold size, all counts in the table are decremented by 1, and any entry with a count of zero is deleted from the table. Add one of Laplace noise, exponential noise, or Gumbel noise to the corresponding count in the entries of the table; In response to a noisy correspondence count being less than a threshold, the corresponding entry in the table is deleted for the noisy correspondence count. Based on multi-party computation using a data domain spanning the multiple clients and determining a set of the top k values ​​across that data domain, wherein the multi-party computation includes multiple computation nodes spanning one or more of the multiple clients, and the set of the top k values ​​is divided into multiple shares distributed across the computation nodes, each computation node exchanges secure input messages with other computation nodes, each secure input message containing one or more shares, and the exchanged secure input messages are used to jointly compute the set of the top k values ​​by combining the shares contained in the secure input messages to reconstruct the set of the top k values, while keeping privacy data confidential from other computation nodes; and Output at least a portion of the table as the set of the first k values.

9. The method according to claim 8, wherein, The table is sorted based on the noisy correspondence count before the output.

10. The method according to claim 8, wherein, The method is implemented by a trusted server, wherein the trusted server includes at least one data processor and at least one memory.

11. The method according to claim 10, wherein, The trusted server performs the multi-party computation.

12. The method according to claim 8, wherein, Receiving the candidate value from each of the plurality of clients includes receiving the candidate value within a security message, the security message further including a portion of the noise value.

13. The method according to claim 12, wherein, Adding noise to the corresponding count in the entry of the table further includes adding the noise based on the partial noise value from each of the plurality of clients.

14. The method according to claim 8, wherein, Outputting at least a portion of the table as the set of the first k values ​​further includes outputting the at least a portion of the table in a security message.

15. The method according to claim 8, wherein, The set of the first k values ​​is output based on differential privacy.

16. A non-transitory computer-readable storage medium comprising instructions that, when executed by at least one processor, produce operations including: To determine the top k values ​​across multiple clients, a table is generated that maps candidate values ​​to entries with corresponding counts. Receive candidate values ​​from each of the plurality of clients; In response to a received candidate value matching an entry in the table, the corresponding count is incremented for the matched candidate value; In response to a received candidate value not matching any of the entries in the table and the table not exceeding a threshold size, an entry is added to the table by adding the received candidate value along with a count value of 1; In response to a received candidate value not matching any of the entries in the table and the table exceeding a threshold size, all counts in the table are decremented by 1, and any entry with a count of zero is deleted from the table. Add one of Laplace noise, exponential noise, or Gumbel noise to the corresponding count in the entries of the table; In response to a noisy correspondence count being less than a threshold, the corresponding entry in the table is deleted for the noisy correspondence count. Based on multi-party computation using a data domain spanning the multiple clients and determining a set of the top k values ​​across that data domain, wherein the multi-party computation includes multiple computation nodes spanning one or more of the multiple clients, and the set of the top k values ​​is divided into multiple shares distributed across the computation nodes, each computation node exchanges secure input messages with other computation nodes, each secure input message containing one or more shares, and the exchanged secure input messages are used to jointly compute the set of the top k values ​​by combining the shares contained in the secure input messages to reconstruct the set of the top k values, while keeping privacy data confidential from other computation nodes; and Output at least a portion of the table as the set of the first k values.