Decision tree based homomorphic encryption data inference

CN115461761BActive Publication Date: 2026-06-26INTERNATIONAL BUSINESS MACHINE CORPORATION

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INTERNATIONAL BUSINESS MACHINE CORPORATION
Filing Date
2021-04-09
Publication Date
2026-06-26

Smart Images

  • Figure CN115461761B_ABST
    Figure CN115461761B_ABST
Patent Text Reader

Abstract

A method, apparatus and computer program product for homomorphic reasoning on a decision tree (DT) model. Instead of performing HE-based reasoning on the decision tree, reasoning is performed on a neural network (NN) as a proxy. To this end, the neural network is trained to learn the DT decision boundaries, preferably without training points using the original DT model data. During training, a random set of data is applied to the DT and the expected output is recorded. This random set of data and the expected output is then used to train the neural network such that the output of the neural network matches the output expected from applying the original set of data to the DT. Preferably, the neural network has a low depth, only a few layers. HE-based reasoning on a decision tree is accomplished using HE reasoning on a shallow neural network. The latter is computationally efficient and can be done without the need for bootstrapping.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to using fully homomorphic encryption operations to facilitate reasoning about encrypted data. Background Technology

[0002] Decision trees are a common decision support model in many applications. A decision tree (DT) is a flowchart-like node structure where each internal node represents a test of an attribute, each branch represents the result of the test, and each leaf node represents the class label (the decision made after evaluating all attributes). The path from the root to the leaf represents the classification rule. Decision trees can be trained on datasets and produce corresponding regression outputs.

[0003] There are many scenarios where training this type of decision tree involves data that is sensitive, such as when the model belongs to an organization but cannot be shared externally. For example, credit card transaction information is available to credit card companies but not to ordinary users. Similarly, patient-related healthcare data is available to hospitals, but researchers cannot use this data to look for patterns in cancer progression. Furthermore, privacy concerns (such as the new European data privacy regulation, GDPR) can limit data availability. A similar situation arises when competitors want to extract their data to build accurate models (e.g., different banks possess transaction-related data and want to build fraud detection models). Restricting data availability can prevent models that would otherwise be useful from being used, or degrade their performance.

[0004] To address these privacy concerns, Machine Learning as a Service (MLaaS) solutions are known in the prior art, where this type of trained model is hosted on a cloud server, and the hosted cloud server as a service allows users to run inference queries on that model. This concept, sometimes referred to as privacy-preserving inference, aims to provide users with a secure way to maintain the privacy of the scoring data returned by the model, while also enabling cloud providers to protect the model's privacy for reasons such as proprietary, regulatory, or other reasons. Example use cases include: hospitals training models (e.g., predicting the probability of disease) while expecting to provide scoring services with strict privacy constraints due to data sensitivity; and financial credit scoring companies training credit risk models and providing scoring services, also with strict privacy constraints due to legal or regulatory requirements. To this end, these types of solutions implement an advanced cryptographic technique called fully homomorphic encryption (FHE), which provides a way to enable secure computation of client data without decryption within the model, while maintaining the confidentiality of the model itself.

[0005] While homomorphic encryption offers significant advantages, decision tree-based models are not well-suited for efficient processing using Fully Homomorphic Encryption (FHE) techniques. This problem arises (in the FHE context) because the fundamental operation performed at model nodes is a comparison of two values, and this comparison (when performed using FHE) is a non-linear operation (e.g., using the sigmoid function f(x) = 1 / (1+ex)). Therefore, the comparison is fuzzy, and scaling is difficult further because the scaling factor is non-uniform. Consequently, implementing these branching computations in fully homomorphic encryption schemes is impractical, thus providing FHE-based inference on decision trees remains a challenge. Summary of the Invention

[0006] Embodiments of the present invention address this problem. Instead of performing homomorphic inference on the DT model itself, the method described herein replaces the DT model with a specially trained low-depth neural network (NN), and then performs homomorphic inference on the neural network. In this way, the neural network acts as a representative or proxy of the DT model, and avoids the unreliable branch computations typically required by inference against the DT model.

[0007] Therefore, and according to an embodiment of the invention, the neural network is trained to learn the decision boundaries of the DT tree. This operation is performed by the DT model owner in plaintext, and preferably the neural network training is performed without using the original data training points used for the DT model itself. In this training phase, a random dataset is applied to the DT, and their expected outputs (from the applied tree) are obtained. The distribution characteristics of the random dataset (e.g., minimum and maximum values, feature mean and variance, etc.) match the distribution characteristics of the original dataset. The neural network is then trained using the random dataset and their expected outputs, such that the output of the neural network matches the expected output when the original dataset is applied to the DT. Preferably, the neural network has a low depth (e.g., less than about three (3) layers) and is therefore sometimes referred to herein as “shallow”. Once trained, inference is performed on the shallow neural network instead of directly on the DT. In other words, HE-based inference on the decision tree is a shallow neural network performed using HE inference on the shallow neural network. The latter is computationally efficient and can be performed without bootstrapping.

[0008] The foregoing outlines some of the more relevant characteristics of this topic. These characteristics should be interpreted as illustrative only. Many other beneficial results can be obtained by applying the disclosed topics in different ways or by modifying the topics as described below. Attached Figure Description

[0009] To gain a more complete understanding of the invention and its advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which:

[0010] Figure 1 An exemplary block diagram is described, illustrating an exemplary aspect of a distributed data processing environment that can implement the illustrative embodiments.

[0011] Figure 2 This is an exemplary block diagram of a data processing system that can implement exemplary aspects of the illustrative embodiments;

[0012] Figure 3 An exemplary cloud computing architecture in which the disclosed topics can be implemented is shown;

[0013] Figure 4 It is a representative machine learning as a service (MLaaS) operating environment that can implement the technology disclosed herein;

[0014] Figure 5 The techniques disclosed herein are described, wherein a low-depth neural network is trained to learn the decision boundaries of a decision tree of interest, thereby enabling the NN to be used as an agent for HE-inference against the decision tree.

[0015] Figure 6 This is a block diagram depicting a set of high-level functions for a system according to this disclosure, including inference based on privacy-preserving HE relative to a decision tree; and

[0016] Figure 7 The techniques disclosed herein are described as being extended to facilitate privacy-preserving homomorphic reasoning for ensembles of decision trees. Detailed Implementation

[0017] Now refer to the attached diagram and see for details. Figure 1-2 Example diagrams are provided illustrating a data processing environment that can implement the illustrative embodiments of this disclosure. It should be understood that... Figure 1-2 These are merely illustrative examples and are not intended to assert or imply any limitation on the environment in which the aspects or embodiments of the disclosed subject matter are implemented. Many modifications may be made to the depicted environment without departing from the scope of the invention.

[0018] Client-server technology

[0019] Now refer to the attached diagram, Figure 1 A graphical representation of an exemplary distributed data processing system in which aspects of the illustrative embodiments can be implemented is depicted. The distributed data processing system 100 may include a computer network in which aspects of the illustrative embodiments can be implemented. The distributed data processing system 100 includes at least one network 102, which is a medium for providing communication links between different devices and computers connected together within the distributed data processing system 100. The network 102 may include connections such as wired, wireless communication links, or fiber optic cables.

[0020] In the depicted example, servers 104 and 106 are connected to network 102 along with storage unit 108. Clients 110, 112, and 114 are also connected to network 102. These clients 110, 112, and 114 can be, for example, personal computers, network computers, etc. In the depicted example, server 104 provides data such as boot files, operating system images, and applications to clients 110, 112, and 114. In the depicted example, clients 110, 112, and 114 are clients of server 104. The distributed data processing system 100 may include additional servers, clients, and other devices not shown.

[0021] In the depicted example, the distributed data processing system 100 is an internet with network 102, which represents a global collection of networks and gateways communicating with each other using the Transmission Control Protocol / Internet Protocol (TCP / IP) protocol suite. The core of the internet is the backbone of high-speed data communication lines between master nodes or host computers, composed of thousands of commercial, government, educational, and other computer systems used for routing data and messages. Of course, the distributed data processing system 100 can also be implemented as including multiple different types of networks, such as, for example, intranets, local area networks (LANs), wide area networks (WANs), etc. As described above, Figure 1 This is intended as an example, not as an architectural limitation on different embodiments of the disclosed subject, and therefore, in Figure 1 The specific elements shown should not be considered as limitations on the environment in which the illustrative embodiments of the invention may be implemented.

[0022] Now for reference Figure 2 A block diagram of an exemplary data processing system that can implement aspects of the illustrative embodiments is shown. Data processing system 200 is a computer (e.g., Figure 1 Examples of client 110 in the present disclosure may be provided, in which computer-usable code or instructions for implementing the processing of illustrative embodiments of the present disclosure may be located.

[0023] Now for reference Figure 2 A block diagram of a data processing system that can implement illustrative embodiments is shown. The data processing system 200 is a computer (such as...) Figure 1 Examples of server 104 or client 110 in the illustrative embodiment may include computer-usable program code or instructions for implementing the process. In this illustrative example, the data processing system 200 includes a communication structure 202 that provides communication between processor unit 204, memory 206, persistent storage 208, communication unit 210, input / output (I / O) unit 212, and display 214.

[0024] Processor unit 204 is used to execute instructions for software that can be loaded into memory 206. Processor unit 204 may be a collection of one or more processors, or it may be a multiprocessor core, depending on the specific implementation. Furthermore, processor unit 204 may be implemented using one or more heterogeneous processor systems, in which the main processor and secondary processor reside on a single chip. As another illustrative example, processor unit 204 may be a symmetric multiprocessor (SMP) system containing multiple processors of the same type.

[0025] Memory 206 and persistent memory 208 are examples of storage devices. A storage device is any hardware capable of temporarily and / or permanently storing information. In these examples, memory 206 may be, for example, random access memory or any other suitable volatile or non-volatile storage device. Persistent memory 208 may take various forms depending on the specific implementation. For example, persistent memory 208 may include one or more components or devices. For example, persistent memory 208 may be a hard disk drive, flash memory, rewritable optical disk, rewritable magnetic tape, or a combination of the above. The medium used by persistent memory 208 may also be removable. For example, a removable hard disk drive may be used for persistent memory 208.

[0026] In these examples, communication unit 210 provides communication with other data processing systems or devices. In these examples, communication unit 210 is a network interface card. Communication unit 210 can provide communication using either or both physical and wireless communication links.

[0027] Input / output unit 212 allows data input and output to other devices that can be connected to data processing system 200. For example, input / output unit 212 can provide connectivity for user input via keyboard and mouse. Furthermore, input / output unit 212 can send output to a printer. Display 214 provides a mechanism for displaying information to the user.

[0028] Instructions for operating systems and applications or programs reside on permanent memory 208. These instructions may be loaded into memory 206 for execution by processor unit 204. Processor unit 204 may use computer-implemented instructions to perform processes of different embodiments, which may reside in memory (e.g., memory 206). These instructions are referred to as program code, computer-usable program code, or computer-readable program code that can be read and executed by a processor in processor unit 204. The program code in different embodiments may be implemented on different physical or tangible computer-readable media, such as memory 206 or permanent memory 208.

[0029] Program code 216 resides functionally on a selectively removable computer-readable medium 218 and can be loaded into or transferred to the data processing system 200 for execution by the processor unit 204. In these examples, program code 216 and computer-readable medium 218 form a computer program product 220. In one example, computer-readable medium 218 may be in a tangible form, such as an optical disc or disk inserted into or placed in a drive or other device that is part of persistent storage 208 for transfer to a storage device, such as a hard disk drive that is part of persistent storage 208. In a tangible form, computer-readable medium 218 may also take the form of persistent storage, such as a hard disk drive, thumb drive, or flash memory connected to the data processing system 200. The tangible form of computer-readable medium 218 is also referred to as a computer-recordable storage medium. In some instances, computer-recordable medium 218 may not be removable.

[0030] Alternatively, program code 216 can be transmitted from computer-readable medium 218 to data processing system 200 via a communication link to communication unit 210 and / or via a connection to input / output unit 212. In illustrative examples, the communication link and / or connection can be physical or wireless. The computer-readable medium can also take the form of intangible media, such as a communication link containing program code or wireless transmission. The different components shown for data processing system 200 are not intended to impose architectural limitations on the ways in which different embodiments can be implemented. Different illustrative embodiments can be implemented in a data processing system that includes, in addition to or in lieu of, those components shown for data processing system 200. Figure 2 Other components shown may differ from the illustrative example shown. As an example, the storage device in data processing system 200 is any hardware device capable of storing data. Memory 206, permanent memory 208, and computer-readable medium 218 are examples of tangible storage devices.

[0031] In another example, a bus system can be used to implement communication structure 202 and may include one or more buses, such as a system bus or an input / output bus. Of course, any suitable type of architecture that provides data transfer between different components or devices attached to the bus system can be used to implement the bus system. Furthermore, the communication unit may include one or more devices for sending and receiving data, such as a modem or network adapter. Further, the memory may be, for example, memory 206 or a cache such as that found in the interface and memory controller hub that may be present in communication structure 202.

[0032] Computer program code for performing the operations of this invention can be written in any combination of one or more programming languages, including object-oriented programming languages ​​such as Java™, Smalltalk, C++, C#, Objective-C, etc., as well as conventional procedural programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the latter case, the remote computer can be connected to the user's computer via any type of network (including a local area network (LAN) or a wide area network (WAN)), or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0033] Those skilled in the art will understand that Figure 1-2 The hardware within can vary depending on the implementation. Besides or replacing... Figure 1-2 The hardware described herein can be replaced with other internal hardware or peripheral devices, such as flash memory, equivalent non-volatile memory, or optical disc drives. Furthermore, the processes of the illustrative embodiments can be applied to multiprocessor data processing systems, rather than the aforementioned SMP systems, without departing from the scope of the disclosed subject matter.

[0034] As will be seen, the techniques described in this article can be applied to, for example... Figure 1 The example illustrates a standard client-server paradigm of collaborative operation, where a client machine communicates with an internet-accessible, web-based portal running on one or more machines. The end-user operates an internet-connected device (e.g., a desktop computer, laptop, internet-enabled mobile device, etc.) capable of accessing and interacting with the portal. Typically, each client or server machine is such as... Figure 2 The diagram illustrates a data processing system comprising hardware and software, and these entities communicate with each other via networks such as the Internet, intranets, extranets, private networks, or any other communication medium or link. A data processing system typically includes one or more processors, an operating system, one or more applications, and one or more utilities. Applications on the data processing system provide native support for Web services, including but not limited to support for HTTP, SOAP, XML, WSDL, UDDI, and WSFL. Information on SOAP, WSDL, UDDI, and WSFL is available from the World Wide Web Consortium (W3C), which is responsible for developing and maintaining these standards; further information on HTTP and XML is available from the Internet Engineering Task Force (IETF). Familiarity with these standards is assumed.

[0035] Cloud computing model

[0036] An emerging information technology (IT) delivery model is cloud computing, through which shared resources, software, and information are provided on demand to computers and other devices via the Internet. Cloud computing can significantly reduce IT costs and complexity while improving workload optimization and service delivery. In this approach, application instances can be hosted and made available from Internet-based resources accessible via a regular web browser over HTTP. An example application could be one that provides a set of common messaging functionalities such as email, calendar, contact management, and instant messaging. Users then access the service directly via the Internet. Using this service, businesses place their email, calendar, and / or collaboration infrastructure in the cloud, and end users use appropriate clients to access their emails or perform calendar operations.

[0037] Cloud computing resources are typically housed in large server clusters running one or more web applications, often using a virtualization architecture. In this architecture, applications run on virtual servers, or so-called "virtual machines" (VMs), which are mapped to physical servers within a data center facility. VMs typically run on top of a hypervisor, which is the program that controls the allocation of physical resources to the VMs.

[0038] Cloud computing is a service delivery model designed to enable convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing power, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with the service provider. This cloud model may include at least five features, at least three service models, and at least four deployment models, all of which are described and defined in more detail in Peter Mell and Tim Grance’s “Draft NIST Working Definition of Cloud Computing” dated October 7, 2009.

[0039] Specifically, the following are typical characteristics:

[0040] On-demand self-service: Cloud consumers can unilaterally and automatically provide computing power, such as server time and network storage, as needed, without requiring human interaction with the service provider.

[0041] Extensive network access: Capabilities are available through networks and accessed via standard mechanisms that facilitate the use of heterogeneous thin client or thick client platforms (e.g., mobile phones, laptops, and PDAs).

[0042] Resource pooling: A provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, where different physical and virtual resources are dynamically assigned and reassigned as needed. There is a sense of location independence because consumers typically do not have control or knowledge of the exact location of the resources provided, but may be able to specify the location at a higher level of abstraction (e.g., country, state, or data center).

[0043] Rapid flexibility: The ability to provide capacity quickly and flexibly, automatically scaling down and up rapidly in some situations to scale up rapidly. For consumers, the available supply capacity often appears unlimited and can be purchased in any quantity at any time.

[0044] Measuring services: Cloud systems automatically control and optimize resource usage by leveraging metering capabilities at a level of abstraction appropriate to the service type (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency to both service providers and consumers.

[0045] The service model is typically as follows:

[0046] Software as a Service (SaaS): This provides consumers with the ability to use the provider's applications running on cloud infrastructure. Applications can be accessed from different client devices via thin client interfaces such as web browsers (e.g., web-based email). Consumers do not manage or control the underlying cloud infrastructure, including the network, servers, operating system, storage, or even individual application capabilities, with possible exceptions such as limited user-specific application configuration settings.

[0047] Platform as a Service (PaaS): This provides consumers with the ability to deploy applications created or acquired by the consumer using programming languages ​​and tools supported by the provider onto cloud infrastructure. Consumers do not manage or control the underlying cloud infrastructure, including networks, servers, operating systems, or storage, but they have control over the deployed applications and the configuration of any application hosting environment.

[0048] Infrastructure as a Service (IaaS): The capabilities offered to consumers are processing, storage, networking, and other basic computing resources that enable consumers to deploy and run arbitrary software, which may include operating systems and applications. Consumers do not manage or control the underlying cloud infrastructure, but rather have control over the operating system, storage, deployed applications, and potentially limited control over selected networking components (e.g., host firewalls).

[0049] The deployment model is typically as follows:

[0050] Private cloud: A cloud infrastructure that operates solely for an organization. It can be managed by the organization or a third party and can exist on-site or off-site.

[0051] Community cloud: A cloud infrastructure shared by several organizations and supporting a specific community with shared concerns (e.g., tasks, security requirements, policies, and compliance considerations). It can be managed by an organization or a third party and can exist on-site or off-site.

[0052] Public cloud: Makes cloud infrastructure available to the public or large industry groups and is owned by an organization that sells cloud services.

[0053] Hybrid cloud: A cloud infrastructure is a combination of two or more clouds (private, community, or public) that remain a single entity but are bound together by standardized or proprietary technologies that enable data and applications to be ported (e.g., cloud bursting for load balancing between clouds).

[0054] Cloud computing environments are service-oriented, focusing on statelessness, loose coupling, modularity, and semantic interoperability. The core of cloud computing is its infrastructure, which includes a network of interconnected nodes. Representative cloud computing nodes are shown above. Figure 2 As shown. Specifically, within a cloud computing node, there exists a computer system / server that can operate alongside many other general-purpose or special-purpose computing system environments or configurations. Examples of well-known computing systems, environments, and / or configurations suitable for use with the computer system / server include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computers, and distributed cloud computing environments that include any of the above systems or devices. The computer system / server can be described in the general context of computer system executable instructions, such as program modules, executed by the computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform specific tasks or implement specific abstract data types. The computer system / server can be practiced in a distributed cloud computing environment where tasks are performed by remote processing devices linked via a communication network. In a distributed cloud computing environment, program modules can reside in local and remote computer system storage media, including memory storage devices.

[0055] See now Figure 3 With the added background, a set of functional abstraction layers provided by the cloud computing environment is illustrated. This should be understood beforehand. Figure 3 The components, layers, and functions shown are intended to be illustrative only, and embodiments of the invention are not limited thereto.

[0056] As described, the following layers and corresponding functions are provided:

[0057] The hardware and software layer 300 includes hardware and software components. Examples of hardware components include mainframes, which in one example are... System; a server based on a RISC (Reduced Instruction Set Computer) architecture, in one example being an IBM... System; IBM System; IBM Systems; storage devices; networks and network components. Examples of software components include network application server software, one example being IBM. Application server software; and database software, in one example, IBM. Database software. (IBM, zSeries, pSeries, xSeries, BladeCenter, WebSphere, and DB2 are trademarks of International Business Machines Corporation registered in many jurisdictions worldwide.)

[0058] The virtualization layer 302 provides an abstraction layer from which the following examples of virtual entities can be provided: virtual servers; virtual storage; virtual networks, including virtual private networks; virtual applications and operating systems; and virtual clients.

[0059] In one example, management layer 304 can provide the following functionalities: Resource provisioning provides dynamic procurement of computing resources and other resources used to perform tasks within the cloud computing environment. Metering and pricing provides cost tracking as resources are utilized within the cloud computing environment and bills or invoices for the consumption of these resources. In one example, these resources may include application software licenses. Security provides authentication for cloud consumers and tasks, as well as protection for data and other resources. The user portal provides consumers and system administrators with access to the cloud computing environment. Service level management provides the allocation and management of cloud computing resources to ensure that required service levels are met. Service level agreements (SLAs) plan and fulfill the pre-arrangement and procurement of cloud computing resources, anticipating future requirements for those resources according to the SLA.

[0060] Workload layer 306 provides examples of functionalities that can be utilized in a cloud computing environment. Examples of workloads and functionalities that can be provided from this layer include: mapping and navigation; software development and lifecycle management; virtual classroom education delivery; data analytics and processing; transaction processing; enterprise-specific functionalities in a private cloud; and, in accordance with this disclosure, techniques for privacy-preserving homomorphic inference on homomorphically encrypted data 308.

[0061] It should be understood in advance that while this disclosure includes a detailed description of cloud computing, the implementation of the teachings cited herein is not limited to cloud computing environments. Rather, embodiments of the disclosed techniques can be implemented in conjunction with any other type of computing environment now known or developed hereafter. These include stand-alone computing environments (e.g., on-site desktop machines), client-server based architectures, and so on.

[0062] Therefore, a representative cloud computing environment has a set of high-level functional components, including a front-end identity manager, a Business Support Service (BSS) functional component, an Operations Support Service (OSS) functional component, and a compute cloud component. The identity manager is responsible for interacting with requesting clients to provide identity management, and this component can be implemented using one or more known systems, such as the Tivoli Federation Identity Manager (TFIM) available from IBM in Armonk, New York. Where appropriate, TFIM can be used to provide federated single sign-on (F-SSO) to other cloud components. The Business Support Service component provides certain management functions, such as billing support. The Operations Support Service component is used to provide provisioning and management of other cloud components, such as virtual machine (VM) instances. A cloud component represents a primary compute resource, typically multiple VM instances used to execute a target application accessible via the cloud. One or more databases are used to store directories, logs, and other working data. All these components (including the front-end identity manager) reside “within” the cloud, but this is not mandatory. In alternative embodiments, the identity manager can operate outside the cloud. Service providers can also operate outside the cloud.

[0063] Some clouds are based on non-traditional IP networks. Thus, for example, a cloud can be based on a two-tier CLOS-based network with special single-tier IP routing that uses hashes of MAC addresses. The techniques described in this article can be used in such non-traditional clouds.

[0064] In general, cloud computing infrastructure provides a virtual machine hosting environment that includes hosts (e.g., servers or similar physical computing devices) connected via a network and one or more management servers. Typically, each physical server is adapted to dynamically provision one or more virtual machines using virtualization technologies such as VMware ESX / ESXi. Multiple VMs can be placed on a single host and share the host's CPU, memory, and other resources, thereby increasing the utilization of an organization's data center. Among other tasks, the management server monitors the infrastructure and automatically manipulates VM placement as needed, such as by moving VMs between hosts.

[0065] In non-restrictive implementations, representative platform technologies include, but are not limited to, IBM Systems with VMware vSphere 4.1 Update 1 and 5.0. server.

[0066] The aforementioned commercial implementation is not intended to be limiting, but is merely a representative example of a client application supported in a cloud computing environment that interacts with cognitive services.

[0067] Homomorphic encryption

[0068] Homomorphic encryption (HE) is a form of encryption that allows computation to be performed on ciphertext to produce an encrypted result that, upon decryption, matches the result of the operation performed on the plaintext. A homomorphic encryption scheme is a cryptographic system that allows computation to be performed on data without decryption. Homomorphic encryption supports building programs for any desired functionality that can run on encrypted input to produce an encrypted result. Because such a program never needs to decrypt its input, it can be run by an untrusted party without revealing its input and internal state. Homomorphic encryption can be partially homomorphic, some homomorphic, or fully homomorphic. Partially homomorphic encryption (PHE) schemes are homomorphic with respect to only one type of operation (e.g., addition or multiplication). Some homomorphic encryptors (SWHE) support homomorphic operations with respect to multiple operations (e.g., addition and multiplication), but not all. Fully homomorphic encryption (FHE) supports an unlimited number of homomorphic operations on ciphertext and is more powerful than PHE and SWHE. Toolkits for implementing homomorphic encryption are known. The well-known toolkit is HE1ib, an open-source project that implements neural network training based on stochastic gradient descent (SGD). The current version of Helib supports addition and multiplication of arbitrary numbers using encrypted binary representations with individual bits.

[0069] Representative HE protocol implementations may be based on one or more cryptographic protocols, including but not limited to unfilled RSA, El-Gamal, Benaloh, Paillier, etc. As will be described, the techniques disclosed herein do not require any specific HE implementation.

[0070] Machine Learning as a Service with Homomorphic Encryption

[0071] Now for reference Figure 4This document describes the basic operating environment used for the techniques described herein. As shown in the figure, in a typical ML-as-a-Service scenario, a trained model 400 is hosted on a cloud server 402 within the cloud computing infrastructure 404 described above. The trained model 400 can be presented as an Application Programming Interface (API) on the cloud 404. In operation, as a service, the hosted cloud server 402 allows users to run inference queries on the model 400. Typically, the user is associated with a client machine 406, and the client and server are configured to operate according to the previously described client-server model. Homomorphic encryption (HE) is implemented across the client-server operating environment, enabling the cloud to protect the privacy of the model while maintaining the privacy of the scoring data points returned by the model for the user (client). In a typical request-response workflow, the client 406 sends an encrypted query 408 (e.g., data points) to the cloud server 402, the cloud server 402 applies the model, and then returns a response 410. This response includes the encrypted inference results. In this way, privacy-preserving inference problems can be securely evaluated.

[0072] While the above approach effectively protects the privacy interests of both the requesting user and the cloud provider hosting the model, this type of reasoning is computationally inefficient for decision tree-based models.

[0073] Decision tree-based reasoning for homomorphic encrypted data without bootstrapping

[0074] With the above as background, the technique of this disclosure will now be described. As mentioned above, when the model under discussion is a decision tree (or set of decision trees), homomorphic inference based on HE is inefficient. To address this inefficiency and instead of performing homomorphic inference on the DT model itself, the method of this paper replaces the DT model with a specially trained neural network (NN) and then performs homomorphic inference on the neural network. In this way, the neural network is a representative or proxy of the DT model and avoids the computational inefficiency (i.e., unreliable branch computation) commonly encountered when inferring DT models.

[0075] To this end, and according to this disclosure, the neural network is trained to learn the decision boundaries of the DT tree (or each such tree as a whole of the trees to be modeled). Preferably, this is done in plaintext by the DT model owner and is done without using the original data training points used for the DT model itself. In this training phase, a random dataset (but with distribution characteristics that match those of the original training data (for the tree)) is applied to the DT, and their expected outputs (from the applied tree) are obtained. The neural network is then trained using this random dataset and their expected outputs such that the output of the neural network matches the expected output when the original dataset is applied to the DT. Preferably, the neural network has a low depth (e.g., less than about three (3) layers), although using a specific number of layers is not required. The concept of a neural network with low depth is sometimes referred to herein as “shallow”. Inference is then performed on the shallow neural network once this type of neural network has been trained, rather than directly against the DT. Thus, in Figure 4 In this context, the "trained model" 400 is actually a shallow neural network relative to the decision tree (or the decision tree as a whole). HE-based inferences on NNs are computationally more efficient and can be performed without bootstrapping.

[0076] Figure 5 The basic techniques disclosed herein for constructing and training shallow neural networks, which are used as representatives or proxies for decision trees (or sets of trees) of interest, are described. As depicted, a trained decision tree 500 represents the model of interest. It is trained on a dataset, which is sometimes referred to herein as the original dataset. More formally, the model is a trained decision tree regression model DT on dataset D. The output, which is a “shallow” neural network 502 preferably learning the decision boundaries of DT in the following manner, is also shown. The first step is to compute a random training dataset D’. The random training set has a domain similar to D, for example, the minimum and maximum feature values ​​are similar for D and D′. More generally, the decision tree 500 is annotated with relevant training data statistics (by means of pre-training) of the original dataset D. The training data statistics can vary, but typically include the minimum and maximum values, mean, variance, etc., for each feature. The specific properties and type of the training data statistics can be changed, and it is assumed that a data generator (described below) can be used to randomly generate dataset D′ using the annotated training data statistics. The second step is to apply the dataset D′ to the decision tree. Preferably, this decision tree inference on D′ is performed in plaintext space (i.e., plaintext), and the corresponding regression output is produced. Subsequently, and in the third step, on dataset D′... A shallow neural network N is trained for the target. After training, N is then used to perform inference to answer encrypted inference queries for a test point x on DT, i.e., Enc(DT)(Enc(x))≈Enc(N)(Enc(x)). The encrypted inference result is then returned to the requesting client to complete the evaluation.

[0077] Therefore, according to the described technique, a shallow neural network (NN) is trained to learn the decision boundaries of the tree. Preferably, this training is performed in plaintext by the model owner and without using the original training data points. In this way, the homomorphic evaluation of the tree is subsequently approximated by performing homomorphic inference on the neural network. This evaluation is efficient because the network is preferably shallow (e.g., an input layer, an output layer, and two (2) hidden layers) and can be performed without bootstrapping. In this way, the technique avoids the unreliable branch computation that would otherwise be required if HE inference is needed for nonlinear comparisons in the decision tree itself. As a byproduct, and because no scaling is required, the scaling problem associated with the inference decision tree itself is also solved.

[0078] Figure 6 A block diagram of a representative computing system implementing the above-described functions is depicted. In a typical implementation, these components are implemented within a cloud computing infrastructure, for example, as computer software (whether physical or virtual) executing on one or more processors. As shown, system 600 includes a data generator 602, a non-private decision tree evaluator 604, a network designer 606, a network trainer 608, a network encryptor 610, and a private evaluator 612. One or more of these components can be combined with each other, and the above nomenclature is not intended to be limiting. The data generator 602 has the primary function of statistically randomly generating a random dataset D' using annotated training data from the original dataset used for pre-training the decision trees. The non-private decision tree evaluator 604 computes the predicted outputs on the random datasets on the decision trees (or on each decision tree in a set of such trees). The network designer 606 constructs the neural network N of the decision trees (or the NN of each decision tree in a collective model). The network trainer 608 trains the neural network N (or each such NN in a collective model) using the randomly generated datasets and the corresponding predicted outputs ^Y (also decision trees). As described above, network trainer 608 trains a shallow neural network to learn the decision boundaries of decision trees (or each decision tree in a set of decision trees). Network encryptor 610 performs homomorphic encryption on each N. In this process, network encryptor 610 encrypts the shallow network using the client's public key. Finally, private evaluator 612 performs homomorphic inference on network N on one or more user-provided HE data points and returns the encrypted result (encrypted prediction) to the user.

[0079] Not intended to be restrictive, shallow neural networks are preferably trained explicitly on randomly generated data to learn the decision boundaries of the decision trees of interest. HE inference over two hidden layers N is a representative but non-limiting embodiment, as such inference is efficiently performed using HE1ib, again without requiring bootstrapping. Experimental analysis shows that on sample regression datasets, the HE inference error is within 2-3% of the non-dedicated counterpart, and the amortized runtime is in the range of 50-300 milliseconds per point, depending on the number of decision trees in the population and the complexity of the NN used.

[0080] Figure 7 A representative embodiment of a decision tree-based model with an ensemble of decision trees is described. Regressors based on ensembles of decision trees have various known types, including but not limited to adaptive boosting regressions, random forest regressions, and gradient boosting regressions. In this embodiment, there exists an ensemble of trees 700, where each tree has an associated shallow neural network 702, as previously described. More formally, and given a trained set of trees Edt = (DT1, DT2, ..., DTk), the output of the neural network ensemble Enn = (N1, N2, ..., Nk) of the tree set (DT1, DT2, ..., DTk) is obtained using a single-tree approach, and then an inference is provided as an aggregation of the single-tree inferences, for example, for gradient boosting: Enc(E dt )(Enc(x))≈Enc(E nn )(Enc(x))=∑ i是[k]的一员 Enc(N i The equation above is not a universal solution because different ensemble methods aggregate different features, such as adaptive boosting (median), random forest (mean), etc.

[0081] The technology disclosed herein offers significant advantages. As already described, the method in this paper provides a way to provide privacy-preserving inference on pre-trained decision trees (specifically, regression models based on ensemble decision trees) in a computationally efficient manner. This method leverages the concept of training a shallow neural network that learns the decision boundaries of the tree, and then approximates the homomorphic evaluation of the tree by performing homomorphic inference on the neural network. Because HE inference on neural networks is efficient, the method here does not require bootstrapping.

[0082] While the above method preferably assumes the availability of complete training data, this is not necessary, as the technique can also be implemented in use cases where the decision model is available but complete or limited training data is unavailable. The decision tree can be pre-existing and available, or it can be accessed from other sources. As mentioned above, the training data used to train the original decision tree may be unavailable (in whole or in part). The task of creating a dataset for training the neural network should be adapted to these different scenarios. Absolute randomness is unproductive and inefficient because most outcome labels are likely to be negative (and therefore not useful). To limit this, and as described, the synthetic dataset used for NN training should simulate the distribution of the original data. Typically, a first-order approximation of the distribution is based on the mean and variance of the original data. As previously mentioned, other statistics (e.g., minimum, maximum) are another minimal descriptor of the original training data to guide the generation of the synthetic data, at least within the range of features expected in the decision tree. Without limiting the foregoing, any other statistical techniques for generating meaningful labels in the process can also be used.

[0083] As described, computing systems implementing this approach are typically implemented in software, for example, as a set of computer program instructions executed by one or more hardware processors. Specific tools or components within the system can include any number of programs, processes, execution threads, etc., along with appropriate interfaces and databases to support the data used or created by the tools or components. The tools or components can be configured or managed using a web-based front end, via command line, etc. The tools or components can include one or more functionalities implemented programmatically or interoperable with other computing entities or software systems through application programming interfaces (APIs) or any convenient request-response protocol.

[0084] Any references to one or more commercial products or services in this document are exemplary and should not be construed as limiting the disclosed technology, which can be implemented and the described operational functions on any system, device, appliance (or more generally, machine) having general characteristics.

[0085] As mentioned above, a preferred implementation of this topic is as a service, but this is not a limitation. Inferences based on HE can be performed entirely on-premises or in a standalone operating environment. As previously noted, and without limitation, the topic can be implemented within or associated with a cloud deployment platform system or appliance, or using any other type of deployment system, product, appliance, program, or process. As already described, model building or inference system functionality can be provided as a standalone function, or it can leverage functionality from other products and services.

[0086] Representative cloud application platforms that can implement this technology include, but are not limited to, any cloud-supported application framework, product, or service.

[0087] In general, the technologies described herein can be implemented as management solutions, services, products, appliances, devices, processes, programs, execution threads, etc. Typically, these technologies are implemented in software as one or more computer programs executing in hardware processing elements, in conjunction with data stored in one or more data sources (such as a problem database). Some or all of the described processing steps can be automated and operate autonomously in conjunction with other systems. Automation can be complete or partial, and operation (complete or partial) can be synchronous or asynchronous, demand-based, or otherwise.

[0088] These components are typically implemented individually as software, that is, as a set of computer program instructions that execute in one or more hardware processors. Components are shown as distinct, but this is not necessary, as components can also be integrated with each other, either wholly or partially. One or more of the components may execute in a dedicated location or remotely from each other. One or more of the components may have sub-components that execute together to provide functionality. Since the functionality (or any aspect thereof) described here can be implemented elsewhere or in a system, it is not required that the specific functionality served by the generator be performed by the specific components described above.

[0089] Tools and response capabilities can interact or interoperate with security analytics systems or services.

[0090] As already described, the functionality described above can be implemented as a standalone method, such as one or more software-based functions executed by one or more hardware processors, or it can be available as a hosted service (including as a web service via a SOAP / XML interface). The specific hardware and software implementation details described herein are for illustrative purposes only and are not intended to limit the scope of the described subject matter.

[0091] More generally, computing devices in the context of the disclosed subject matter are data processing systems (such as hardware and software) Figure 2 (As shown in the diagram), and these entities communicate with each other via networks such as the Internet, intranets, extranets, private networks, or any other communication medium or link. Applications on the data processing system provide native support for the Web and other known services and protocols, including but not limited to support for HTTP, FTP, SMTP, SOAP, XML, WSDL, UDDI, and WSFL. Information on SOAP, WSDL, UDDI, and WSFL is available from the World Wide Web Consortium (W3C), which is responsible for developing and maintaining these standards; further information on HTTP, FTP, SMTP, and XML is available from the Internet Engineering Task Force (IETF).

[0092] As noted, and in addition to cloud-based environments, the techniques described in this paper can be implemented in or in combination with different server-side architectures, including simple n-tier architectures, web portals, federated systems, and so on.

[0093] More generally, the subject matter described herein may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment that includes both hardware and software elements. In a preferred embodiment, the sensitive data detection service (or any component thereof) is implemented in software, including but not limited to firmware, resident software, microcode, etc. Furthermore, the download and deletion interface and functionality may take the form of a computer program product accessible from a computer-usable or computer-readable medium that provides program code for use by or in conjunction with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium may be any means that can contain or store programs for or in conjunction with an instruction execution system, apparatus, or device. This medium may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device). Examples of computer-readable media include semiconductor or solid-state memory, magnetic tape, removable computer disks, random access memory (RAM), read-only memory (ROM), rigid disks, and optical discs. Current examples of optical discs include compact disc-read-only memory (CD-ROM), compact disc-read / write (CD-R / W), and DVDs. Computer-readable media are tangible, non-transitory substances.

[0094] A computer program product may be a product having program instructions (or program code) for implementing one or more of the described functions. Those instructions or code may be stored in a computer-readable storage medium within a data processing system after being downloaded from a remote data processing system via a network. Alternatively, those instructions or code may be stored in a computer-readable storage medium within a server data processing system and are adapted to be downloaded via a network to a remote data processing system for use in a computer-readable storage medium within the remote system.

[0095] In a representative embodiment, the technology is implemented in a dedicated computing platform, preferably in software executed by one or more processors. The software is maintained in one or more data stores or memories associated with the one or more processors, and the software can be implemented as one or more computer programs. In general, the dedicated hardware and software include the functions described above.

[0096] While a specific order of operations performed by certain embodiments of the invention has been described above, it should be understood that such an order is exemplary, as alternative embodiments may perform operations in a different order, combine certain operations, overlap certain operations, etc. References to a given embodiment in the specification indicate that the described embodiment may include a particular feature, structure, or characteristic, but each embodiment may not necessarily include that particular feature, structure, or characteristic.

[0097] Finally, although the given components of the system have been described individually, those skilled in the art will understand that some functions can be combined or shared in a given instruction, program sequence, code section, etc.

[0098] Furthermore, FHE is merely a representative encryption protocol and is not intended to be restrictive.

[0099] Furthermore, while inference preferably occurs using the HE protocol, the method described in this paper, which uses an alternative neural network instead of the actual decision tree itself (for model evaluation), can be used in conjunction with other multi-party secure computation techniques that are expected to preserve the privacy of test points, models, or both.

[0100] The technologies described herein provide improvements to another technology or technology field (i.e., HE-based recommendation tools and systems, and cloud-based systems that incorporate or expose such technologies), as well as improvements to the computational efficiency of HE systems and methods.

[0101] The specific use cases or applications in which decision trees are used do not limit this disclosure.

Claims

1. A computer-implemented method for performing inference queries against a decision tree-based model, comprising: A random data set is applied to the decision tree of the decision tree-based model, the random data set matching the distribution characteristics of the original data set used to train the decision tree, and the corresponding regression output is obtained by applying the random data set; The neural network model is trained using the random data set and the regression output to learn one or more decision boundaries of the decision tree, wherein, after the training, the output from the trained neural network model substantially matches the output expected to be obtained by applying the random data set to the decision tree; Replace the decision tree-based model with the trained neural network model; and Homomorphic inference is performed on the trained neural network model against the decision tree to provide decision tree-based inference, wherein the homomorphic inference includes: The neural network model is encrypted using the client's public key; Hosting encrypted neural network models in a cloud computing environment; Receive encrypted inference queries from the client; Applying encrypted neural network models to encrypted inference queries without decrypting them; and The encrypted result is returned to the client.

2. The method according to claim 1, wherein, The distribution characteristics include training statistics.

3. The method according to claim 2, wherein, The training statistics are one of the following: minimum eigenvalue, maximum eigenvalue, eigenmean, and variance.

4. The method according to any one of claims 1-3, wherein, The decision tree-based model includes a set of decision trees, the trained neural network model includes a set of trained neural network models, and the inference provision includes an aggregation of inferences provided by each decision tree-based model.

5. The method according to claim 4, wherein, The set is one of the following: adaptive boosting regression, random forest regression, and gradient boosting regression.

6. An apparatus for performing inference queries against a decision tree-based model, comprising: processor; A computer memory storing computer program instructions executed by the processor, the computer program instructions being configured to provide privacy-preserving homomorphic inference for a decision tree-based model, the computer program instructions being configured to: A random data set is applied to the decision tree of the decision tree-based model, the random data set matching the distribution characteristics of the original data set used to train the decision tree, and the corresponding regression output is obtained by applying the random data set; The neural network model is trained using the random data set and the regression output to learn one or more decision boundaries of the decision tree, wherein, after the training, the output from the trained neural network model substantially matches the output expected to be obtained by applying the random data set to the decision tree; Replace the decision tree-based model with the trained neural network model; and Homomorphic inference is performed on the trained neural network model against the decision tree to provide decision tree-based inference, wherein the computer program instructions configured to perform homomorphic inference are further configured to: The neural network model is encrypted using the client's public key; Hosting encrypted neural network models in a cloud computing environment; Receive encrypted inference queries from the client; Applying encrypted neural network models to encrypted inference queries without decrypting them; and The encrypted result is returned to the client.

7. The device according to claim 6, wherein, The distribution characteristics include training statistics.

8. The device according to claim 7, wherein, The training statistics are one of the following: minimum eigenvalue, maximum eigenvalue, eigenmean, and variance.

9. The device according to any one of claims 6 to 8, wherein, The decision tree-based model includes a set of decision trees, and the computer program instructions are further configured to: train a set of neural network models, and provide an aggregation of inference for each decision tree.

10. The device according to claim 9, wherein, The set is one of the following: adaptive boosting regression, random forest regression, and gradient boosting regression.

11. A computer program product for performing inference queries against a decision tree-based model, the computer program product comprising instructions that, when executed by a computer, cause the computer to perform the steps of the method according to any one of claims 1 to 5.

12. A computer-readable storage medium having a computer program product according to claim 11 stored thereon.