A system and method for multiparty data clean room with differentially private insight sharing

The multiparty data clean room system addresses privacy compliance challenges by using differential privacy techniques and a privacy budget framework for secure, collaborative data sharing, ensuring compliance and privacy protection.

WO2026132992A1PCT designated stage Publication Date: 2026-06-25PRIVASAPIEN TECH PTE LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
PRIVASAPIEN TECH PTE LTD
Filing Date
2025-12-10
Publication Date
2026-06-25

Smart Images

  • Figure IB2025062654_25062026_PF_FP_ABST
    Figure IB2025062654_25062026_PF_FP_ABST
Patent Text Reader

Abstract

A system (120) and method (400) for multiparty data clean room with differentially private insight sharing is disclosed The system (120) comprises a data source connection module (315) to provide access to a plurality of data fields. A query receiving module (320) to capture textual inputs from a user (115) and a query generation module (325) to convert them into structured queries. A privacy budget calculation module (330) computes privacy expenditure based on aggregated analytical values (370) of data fields, while a pending budget tracking module (335) monitors consumption. A response module (340) provides perturbed data (375) outputs aligned with differential privacy principles, supported by a differential privacy parameter configuration module (355) for dynamic noise adjustment. A query-based risk assessment module (345) evaluates cumulative risk using privacy threat modelling integrated with budget calculations. A dashboard module (360) to display system activities, privacy budget (365), query history, and risk metrics.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] A SYSTEM AND METHOD FOR MULTIPARTY DATA CLEAN ROOM WITH DIFFERENTIALLY PRIVATE INSIGHT SHARING

[0002] EARLIEST PRIORITY DATE:

[0003] This Application claims priority from a provisional patent application filed in India having Patent Application No. 202441101478, filed on December 20, 2024, and titled “SYSTEM AND METHOD FOR MULTI PARTY DATA CLEAN ROOM WITH DIFFERENTIALLY PRIVATE INSIGHT SHARING”.

[0004] FIELD OF INVENTION

[0005] The present invention relates to the field of data sharing, particularly privacy aware data sharing, more particularly, the present invention relates to a system and method for multiparty data clean room with differentially private insight sharing.

[0006] BACKGROUND

[0007] In today’s data-driven landscape, effective data sharing and collaboration among organizations are essential for driving innovation, improving services, and enhancing decision-making. Industries such as finance and healthcare rely heavily on the exchange of data to provide better products, streamline operations, and improve patient outcomes. For instance, in the finance sector, institutions need to share customer data to enhance risk assessments, improve fraud detection, and tailor financial products to individual needs. Similarly, in healthcare, collaboration among hospitals, clinics, and research institutions is vital for sharing patient records, conducting clinical trials, and advancing medical research.

[0008] However, the growing emphasis on data privacy presents significant challenges to such collaborative efforts. Privacy regulations, such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA), and Digital Personal Data Protection Act impose stringent requirements on how sensitive data can be collected, stored, and shared. These regulations are designed to protect individuals’ personal information and prevent unauthorized access; however, they can create barriers to efficient data exchange. As organizations navigate these privacy requirements, the risk of non-compliance looms large, often resulting in reduced willingness to share valuable data.

[0009] Hence, there is a need for an improved system and method for multiparty data clean room with differentially private insight sharing to address the aforementioned issue(s).

[0010] OBJECTIVES OF THE INVENTION

[0011] The primary objective of the invention is to develop a multiparty data clean room system that enables secure and collaborative data analysis across multiple entities while preserving privacy through differential privacy techniques.

[0012] Another objective of the invention is to implement a privacy budget management framework that calculates, monitors, and enforces privacy expenditure for each structured query, thereby preventing unauthorized extraction of sensitive information.

[0013] Another objective of the invention is to provide dynamic privacy risk quantification and mitigation by associating analytical values to data fields, assessing query -based risks, and integrating privacy threat modelling for robust protection against reidentification.

[0014] Yet another objective of the invention is to offer transparency and control to users and administrators through configurable privacy parameters, noise adjustment mechanisms, and an interactive dashboard displaying system activities, privacy budgets, query history, and risk metrics.

[0015] SUMMARY

[0016] In accordance with an embodiment of the present disclosure, a system for multiparty data clean room with differentially private insight sharing is disclosed. The system includes a processor and a memory coupled to the processor, wherein the memory comprises instructions that when executed by the processor cause the processor to receive one or more textual inputs as a query to a session from a user via a user interface. The processor also executes instructions to convert the one or more textual inputs into a corresponding structured query. The processor also executes instructions to calculate a privacy budget associated with the structured query, wherein the privacy budget is calculated as an aggregated analytical value obtained based on a plurality of data fields of one or more data sources pertaining to the structured query. The processor also executes instructions to execute the one or more structured queries. Further the processor also executes instructions to continuously monitor the privacy budget to estimate consumption of the structured query. Furthermore, the processor also executes instructions to provide one or more responses with a corresponding perturbed data based on the privacy budget thereby preventing extraction of private data. Moreover, the processor also executes instructions to aggregate the one or more responses, quantify their risk based on the plurality of data fields of the one or more data sources using a privacy threat modelling output integrated into a differential private budget calculation per structured query for effective risk quantification and mitigation.

[0017] In accordance with an embodiment of the present disclosure, a method for multiparty data clean room with differentially private insight sharing is disclosed. The method includes receiving one or more textual inputs as a query to a session from a user via a user interface. The method also includes converting the one or more textual inputs into a corresponding structured query. The method also includes calculating a privacy budget associated with the structured query, wherein the privacy budget is calculated as an aggregated analytical value obtained based on a plurality of data fields of one or more data sources pertaining to the structured query. The method also includes executing the one or more structured queries. The method further includes continuously monitoring the privacy budget to estimate consumption of the structured query. Furthermore, the method also includes providing one or more responses with a corresponding perturbed data based on the privacy budget thereby preventing extraction of private data. Moreover, the method also includes aggregating the one or more responses, quantifying their risk based on the plurality of data fields of the one or more data sources using a privacy threat modelling output integrated into a differential private budget calculation per structured query for effective risk quantification and mitigation.

[0018] To further clarify the advantages and features of the present disclosure, a more particular description of the disclosure will follow by reference to specific embodiments thereof, which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting in scope. The disclosure will be described and explained with additional specificity and detail with the appended figures.

[0019] BRIEF DESCRIPTION OF THE DRAWINGS

[0020] The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:

[0021] FIG. 1 illustrates a network environment of system for multiparty data clean room with differentially private insight sharing in accordance with an embodiment of the present disclosure;

[0022] FIG. 2 illustrates a schematic diagram of a user device of FIG. 1, in accordance with an example implementation of the present subject matter;

[0023] FIG. 3 illustrates a schematic diagram of a system for multiparty data clean room with differentially private insight sharing FIG. 1, in accordance with an embodiment of the present disclosure;

[0024] FIG. 4 is a flow chart representing the steps involved in a method for multiparty data clean room with differentially private insight sharing, in accordance with an embodiment of the present disclosure.

[0025] Further, those skilled in the art will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been drawn to scale. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein.

[0026] DETAILED DESCRIPTION

[0027] For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure.

[0028] The terms “comprises”, “comprising”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more devices or subsystems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other devices, sub-systems, elements, structures, components, additional devices, additional sub-systems, additional elements, additional structures, or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.

[0029] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting. In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.

[0030] FIG. 1 illustrates a network environment for implementing example techniques of system for multiparty data clean room with differentially private insight sharing in accordance with an embodiment of the present disclosure.

[0031] Referring to FIG. 1, a user device (105) corresponding to a passenger may be communicatively coupled to a system (120). The passenger is an individual who is destined to use and airline operation. Further, the user may access the system (120) over a network (110). Examples of the user device (105) includes, but is not limited to, a mobile phone, desktop computer, portable digital assistant (PDA), smart phone, tablet, ultra-book, netbook, laptop, multi-processor system, microprocessorbased or programmable consumer electronic system, or any other communication device that a user may use. It will be appreciated that the system (120) may be presented to the user on a corresponding user device (105) as a web application accessed through a browser, through a software application on the user device (105), or, particularly for smartphones, through a mobile application installed at the smartphone. It will be appreciated that, within the context of the disclosure herein, web application refers to a utility implemented on a networked computing system accessible by user device (105) over the Internet (e.g. through browsers) wherein the bulk of the processing takes place at the networked computing system, mobile applications refer to applications installed on smartphones that may communicate with a networked computing system, and a “software” application refers generally to applications other than web browsers installed on other types of user device (105) that may communicate with a networked computing system over the network (110).

[0032] The network (110) may be a single communication network or a combination of multiple communication networks and may use a variety of different communication protocols. The personalized network may be a wireless network, a wired network, or a combination thereof. Examples of such individual personalized networks include, but are not limited to, Global System for Mobile Communication (GSM) network, Universal Mobile Telecommunications System (UMTS) network, Personal Communications Service (PCS) network, Time Division Multiple Access TDMA) network, Code Division Multiple Access (CDMA) network, Next Generation Network (NGN), Public Switched Telephone Network (PSTN). Depending on the technology, the personalized network (110) may include various network entities, such as gateways and routers; however, such details have been omitted for the sake of brevity of the present description.

[0033] The system (120) may have a homepage that is presented to the user (115) accessing a top-level web address for web applications presented to the user (115) in a browser or a welcome screen for software and mobile applications. The homepage may include links to a user log-in interface or general information about the system (120) and the option to register as user (115). It will be appreciated that the presentation of a homepage may not be necessary, for example, if a user (115) bypasses it by directly inputting a web address corresponding to a user log-in page, or if a separate mobile application is designed for users.

[0034] A new or unregistered user (115) can access the user log-in interface, fill out the log-in information corresponding to the user's account, and indicate that the user (115) wishes to sign in. It will be appreciated that any conventional registration and log-in techniques for web applications, software application, and mobile applications may be used, whichever is appropriate for the user. While registering the user (115) may be prompted to provide username and corresponding user credentials, not limited to, password, geographical location, and contact information and upon receipt of the foregoing information, a corresponding userprofile may be created and stored on a respective database (385) of the system (120).

[0035] In accordance with an embodiment of the present disclosure, a system (120) for multiparty data clean room with differentially private insight sharing is provided. The system (120) comprises a processor (305) and a machine-readable storage medium comprising instructions that, when executed by the processor (305), cause the processor (305) to receive one or more textual inputs as a query to a session from a user (115) via a user interface (390). The processor (305) also executes instructions to convert the one or more textual inputs into a corresponding structured query. The processor (305) also executes instructions to calculate a privacy budget (365) associated with the structured query, wherein the privacy budget (365) is calculated as an aggregated analytical value (370) obtained based on a plurality of data fields of one or more data sources pertaining to the structured query. The processor (305) also executes instructions to execute the one or more structured queries. Further the processor (305) also executes instructions to continuously monitor the privacy budget (365) to estimate consumption of the structured query. Furthermore, the processor (305) also executes instructions to provide one or more responses with a corresponding perturbed data (375) based on the privacy budget (365) thereby preventing extraction of private data. Moreover, the processor (305) also executes instructions to aggregate the one or more responses, quantify their risk based on the plurality of data fields of the one or more data sources using a privacy threat modelling output integrated into a differential private budget calculation per structured query for effective risk quantification and mitigation.

[0036] It may be noted that the foregoing system (120) is an exemplary system (120) and may be implemented as computer executable instructions in any computing or processing environment, including in digital electronic circuitry or in computer hardware, firmware, device driver, or software. As such, the system (120) is not limited to any specific hardware or software configuration.

[0037] FIG. 2 illustrates a schematic diagram of a user device (105), in accordance with an example implementation of the present subject matter. Referring to FIG. 2, the user device (105) may comprise a processor(s) (205), a memory(s) (210) coupled to and accessible by the processor(s) (205), and an interface (225) coupled to the memory(s) (210). The user device (105) disclosed herein may be same as the user device (105) described in FIG. 1. The functions of various elements shown in the figs., including any functional blocks labelled as "processor(s)" (205), may be provided through the use of dedicated hardware as well as hardware capable of executing instructions. When provided by a processor (205), the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" (205) would not be construed to refer exclusively to hardware capable of executing instructions, and may implicitly comprise, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA). Other hardware, standard and / or custom, may also be coupled to the processor(s) (205). The user device (105) may further include a display (215) in addition to other components such as, but not limited to, keyboard, sensors, logic circuits etc. Further, the user device (105) may include data (220) which may include data (220) that may be stored, utilized, or generated during the operation of the user device (105).

[0038] The memory(s) (210) may be a computer-readable medium, examples of which comprise volatile memory (e.g., RAM), and / or non-volatile memory (e.g., Erasable Programmable read-only memory, i.e., EPROM, flash memory, etc.). The memory(s) (210) may be an external memory, or internal memory, such as a flash drive, a compact disk drive, an external hard disk drive, or the like. The user device (105) may further include an interface (225) that may allow the connection or coupling of the user device (105) with one or more other devices, through a wired (e.g., Local Area Network, i.e., LAN) connection or through a wireless connection (e.g., Bluetooth®, Wi-Fi), for example, for connecting to the system (120) shown in FIG. 1. The interface (225) may also enable intercommunication between different logical as well as hardware components of the user device (105).

[0039] FIG. 3 illustrates a schematic diagram of a system for multiparty data clean room with differentially private insight sharing of FIG. 1, in accordance with an embodiment of the present disclosure. Referring to FIG. 3, the system (120) includes a processor(s) (305), a memory(s) (310) coupled to and accessible by the processor(s) (305), database (385) and a user interface (390) coupled to the memory(s) (310).

[0040] The system (120) disclosed herein is the same as the system (120) described in FIG. 1. The functions of various elements shown in the figs., including any functional blocks labelled as "processor(s)" (305), may be provided through the use of dedicated hardware as well as hardware capable of executing instructions. When provided by a processor (305), the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" (305) would not be construed to refer exclusively to hardware capable of executing instructions, and may implicitly comprise, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA). Other hardware, standard and / or custom, may also be coupled to the processor(s) (305). The system (120) may further include other components such as, but not limited to, keyboard, sensors, logic circuits, input / output interfaces etc. Further, the system (120) may include data (not shown) which may include data that may be stored, utilized, or generated during the operation of the computer implemented system (120).

[0041] The memory(s) (310) may be a computer-readable medium, examples of which comprise volatile memory (e.g., RAM), and / or non-volatile memory (e.g., Erasable Programmable read-only memory, i.e., EPROM, flash memory, etc.). The memory(s) (310) may be an external memory, or internal memory, such as a flash drive, a compact disk drive, an external hard disk drive, or the like. The system (120) may further include the user interface (390) that may allow the connection or coupling of the system (120) with one or more other devices, through a wired (e.g., Local Area Network, i.e., LAN) connection or through a wireless connection (e.g., Bluetooth®, Wi-Fi)., for example, for connecting to the user device (105) as shown in FIG. 1. The user interface (390) may also enable intercommunication between different logical as well as hardware components of the system (120). The system (120) may be provided with a database to store a privacy budget (365), analytical value (370), perturbed data (375), and privacy risk metrics (380). In an example implementation of the system (120) including one or more servers, the databases (385) may databases (385) local to the server or may be remote to the server. It may be noted that the data in the databases (385) may be stored as a table or may be pre-stored as a mapping with the other. This application is not limited thereto.

[0042] The system (120) may include module(s). The module(s) may include a data source connection module (315), a query receiving module (320), a query generation module (325), a privacy budget calculation module (330), a pending budget tracking module (335), a response module (340), a query-based risk assessment module (345), a privacy risk quantification module (350), a differential privacy parameter configuration module (355) and a dashboard module (360). In one example, the module(s) may be implemented as a combination of hardware and firmware. In an example described herein, such combinations of hardware and firmware may be implemented in several different ways. For example, the firmware for module(s) may be processor (305) executable instructions stored on a non- transitory machine-readable storage medium and the hardware for the module(s) may include a processing resource (for example, implemented as either single processor or combination of multiple processors), to execute such instructions. Further, the hardware for the module(s) may include communication apparatuses, control circuitries involving electrical and electronics components, sensors, and interface devices, which may be in communication with each other for multidirectional communication therebetween.

[0043] Further, the system (120) includes data. The data may include data that is either stored or generated as a result of functions implemented by the system (120). In an example, data may include a privacy budget (365), analytical value (370), perturbed data (375), and privacy risk metrics (380). It may be noted that such examples of the various functions are only indicative. The present approaches may be applicable to other examples without deviating from the scope of the present subject matter. In the present examples, the non-transitory machine-readable storage medium may store instructions that, when executed by the processing resource, implement the functionalities of modules(s). In such examples, the system (120) may include the machine-readable storage medium storing the instructions and the processing resource to execute the instructions. In other examples of the present subject matter, the machine-readable storage medium may be located at a different location but accessible to the system (120) and the processor(s) (305).

[0044] In one embodiment of the operation, the data source connection module (315) is configured to validate a plurality of data fields stored in one or more databases hosted by multiple parties wherein the plurality of data fields is provided with access permissions. The data source connection module (315) establishes secure connectivity between the system (120) and distributed data repositories maintained by different entities participating in the clean room environment.

[0045] The data source connection module (315) is designed to authenticate and authorize access to the data fields based on predefined permissions, ensuring that only permitted data elements are available for processing. This validation process involves verifying the structural integrity of the data fields, confirming compliance with schema definitions, and enforcing access control policies defined by the respective data owners. By implementing these checks, the module prevents unauthorized or inadvertent exposure of sensitive information during query execution.

[0046] Further, the data source connection module (315) maintains session-level mappings between the validated data fields and the corresponding structured queries generated by the system (120). This mapping ensures that the queries operate exclusively on authorized datasets, thereby preserving the confidentiality and integrity of multiparty data collaboration. Through its secure validation and permission enforcement capabilities, the data source connection module (315) forms a critical layer of trust within the system (120) architecture, enabling privacypreserving data sharing across multiple stakeholders. In further operation, the query receiving module (320) is configured to receive one or more textual inputs as a query to a session from a user via a user interface (390). The query receiving module (320) serves as the primary interaction point between the user (115) and the system (120), enabling the initiation of data analysis requests in a natural language format. The module is designed to capture user-provided textual inputs accurately and associate them with an active session, ensuring that the context of the query is maintained throughout the processing lifecycle.

[0047] The user interface (390) operatively coupled to the query receiving module (320) may be implemented as a web-based application, a desktop application, or a mobile application, allowing seamless accessibility across different platforms. Upon receiving the textual input, the query receiving module (320) validates the format and structure of the input to ensure compliance with predefined syntactic and semantic rules. This validation step prevents erroneous or incomplete queries from proceeding further in the processing pipeline.

[0048] The query receiving module (320) further manages session-level metadata, including user identification, query timestamps, and session state, thereby enabling traceability and accountability for each query processed by the system (120). By providing a robust mechanism for capturing and validating user queries, the query receiving module (320) establishes the foundation for subsequent conversion of textual inputs into structured queries, ensuring accuracy and consistency in data retrieval operations.

[0049] In further operation the query generation module (325) is operatively coupled to the query receiving module (320) and is configured to convert the one or more textual inputs into a corresponding structured query. The query generation module (325) functions as an intermediary processing component that transforms user-provided natural language inputs into machine-readable query formats suitable for execution on underlying data sources.

[0050] Upon receiving validated textual inputs from the query receiving module (320), the query generation module (325) applies parsing and interpretation techniques to extract semantic meaning and identify relevant entities, attributes, and conditions embedded within the user query. This conversion process ensures that the structured query adheres to the syntax and operational constraints of the target data environment, thereby enabling accurate and efficient retrieval of information.

[0051] The query generation module (325) further maintains contextual integrity by preserving session-specific parameters and user-defined constraints during the conversion process. By automating the transformation of natural language queries into structured formats, the query generation module (325) eliminates manual intervention, reduces processing errors, and enhances the overall usability of the system (120) for multiparty data collaboration.

[0052] In one embodiment, the one or more textual inputs are converted into the corresponding structured query using a machine learning model. The machine learning model is trained to interpret natural language inputs and map them to structured query formats by leveraging semantic understanding and contextual analysis.

[0053] The model employs advanced techniques such as natural language processing (NLP) and entity recognition to identify relevant attributes, conditions, and logical operators embedded within the user’s textual input. By utilizing these capabilities, the system (120) ensures that even complex or ambiguous queries are accurately translated into executable structured queries without requiring manual intervention.

[0054] In another embodiment, the privacy risk quantification module (350) is configured to associate an analytical value (370) to the plurality of data fields based on a likelihood of identification of the user and an impact of exposure of the data fields, wherein the analytical value (370) is one of a numerical and categorical score. The analytical value (370) represents a quantifiable measure of privacy sensitivity and is utilized to assess the potential risk associated with accessing or processing specific data elements within the clean room environment. The module evaluates each data field against predefined privacy criteria, including the probability of re-identification and the severity of consequences in case of exposure. Based on this evaluation, the module assigns an analytical value (370) in the form of either a numerical score or a categorical classification, thereby enabling systematic comparison and prioritization of privacy risks across multiple datasets.

[0055] By integrating these analytical values (370) into subsequent privacy budget (365) calculations, the privacy risk quantification module (350) ensures that queries consuming highly sensitive data incur proportionally higher privacy expenditure. This approach provides a robust mechanism for enforcing privacy-preserving operations while maintaining analytical utility, thereby supporting compliance with regulatory standards and organizational privacy policies.

[0056] In further operation, the privacy budget calculation module (330) is operatively coupled to the query generation module (325) and is configured to calculate a privacy budget (365) associated with the structured query. Further, the privacy budget (365) is calculated as an aggregated analytical value (370) obtained based on a plurality of data fields of one or more data sources pertaining to the structured query and execute the one or more structured queries. The privacy budget (365) represents a quantifiable measure of privacy expenditure incurred during the execution of a query and serves as a control mechanism to prevent excessive exposure of sensitive data.

[0057] The calculation of the privacy budget (365) is performed by aggregating analytical values (370) corresponding to the plurality of data fields from one or more data sources that are relevant to the structured query. These analytical values (370), derived from prior risk assessments, reflect the sensitivity and potential impact of disclosure for each data field. By summing these values, the module determines the cumulative privacy cost associated with processing the query, ensuring that the overall privacy budget (365) allocated to the user or session is not exceeded.

[0058] Upon completion of the calculation, the privacy budget (365) calculation module (330) initiates the execution of the structured query within the constraints of the available privacy budget (365). This controlled execution ensures that the system (120) adheres to differential privacy principles by limiting the amount of information disclosed and applying appropriate perturbation techniques where necessary.

[0059] In one embodiment, the privacy budget (365) associated with the user is incrementally consumed upon processing the structured query, resulting in a progressive reduction of the available privacy budget (365). This mechanism ensures that each query execution deducts a proportional amount of privacy expenditure based on the sensitivity of the data fields accessed and the complexity of the query.

[0060] The incremental consumption model provides a dynamic and transparent approach to privacy management by continuously updating the remaining budget after each query operation. This prevents excessive or uncontrolled data access and enforces compliance with predefined privacy constraints throughout the session.

[0061] In another embodiment, the privacy budget (365) is assigned by an admin of the respective parties along with a plurality of privacy parameters, wherein the plurality of privacy parameters indicates a privacy level conserved in each of the plurality of data fields. The administrator defines the initial privacy budget (365) for the user or session based on organizational policies, regulatory requirements, and the sensitivity of the datasets involved in the multiparty clean room environment.

[0062] The plurality of privacy parameters serves as configurable indicators that specify the degree of privacy protection applied to individual data fields. These parameters may include factors which includes but are not limited to permissible noise levels, aggregation thresholds, and exposure limits, which collectively determine the privacy level maintained during query execution.

[0063] By associating these parameters with the assigned privacy budget (365), the system (120) ensures that privacy enforcement is tailored to the characteristics of the underlying data and the operational context of each participating entity. The plurality of privacy parameters enhances governance and flexibility by allowing administrators to exercise granular control over privacy configurations while maintaining compliance with differential privacy principles.

[0064] In further operation, the pending budget tracking module (335) is operatively coupled to the privacy budget calculation module (330) and is configured to continuously monitor the privacy budget (365) to estimate consumption of the structured query. The pending budget tracking module (335) functions as a realtime oversight component that ensures the privacy budget (365) allocated to the user or session is not exceeded during query execution.

[0065] The pending budget tracking module (335) dynamically tracks the incremental consumption of the privacy budget (365) as each structured query is processed, updating the remaining budget after every operation. This continuous monitoring enables the system (120) to enforce privacy constraints proactively by preventing further queries when the available budget falls below a predefined threshold.

[0066] By maintaining an accurate and up-to-date record of privacy expenditure, the pending budget tracking module (335) supports transparency and accountability within the multiparty clean room environment. Its integration with the privacy budget calculation module (330) ensures seamless coordination between budget computation and enforcement, thereby safeguarding sensitive data while enabling controlled analytical operations.

[0067] In further operation the response module (340) is operatively coupled to the pending budget tracking module (335) and is configured to provide one or more responses with a corresponding perturbed data (375) based on the privacy budget (365) thereby preventing extraction of private data. The response module (340) serves as the final stage in the query processing pipeline, ensuring that the output delivered to the user adheres to differential privacy principles and complies with the allocated privacy budget (365). Upon receiving confirmation of the remaining privacy budget (365) from the pending budget tracking module (335), the response module (340) retrieves the results of the executed structured query and applies perturbation techniques to the data. These techniques involve introducing controlled noise to the query output, thereby reducing the risk of re-identification while preserving the analytical utility of the response.

[0068] The level of perturbation is dynamically adjusted based on the privacy parameters and the residual privacy budget (365), ensuring that sensitive information remains protected throughout the interaction. By delivering responses in a privacypreserving manner, the response module (340) prevents unauthorized inference of individual-level data and mitigates potential privacy risks associated with multiparty data collaboration.

[0069] In one embodiment, the differential privacy parameter configuration module (355) is configured to adjust an amount of noise of the plurality of data fields to generate the perturbed data (375) thereby preserving a corresponding analytical value (370) of the plurality of data fields. The differential privacy parameter configuration module (355) operates as a critical component for enforcing differential privacy principles by introducing controlled randomness into query outputs without compromising the overall utility of the data.

[0070] The differential privacy parameter configuration module (355) dynamically determines the optimal noise level based on predefined privacy parameters and the sensitivity of the data fields involved in the query. By calibrating the noise addition process, the system (120) ensures that the analytical value (370) associated with the data fields remains intact, allowing meaningful insights to be derived while mitigating the risk of individual-level data exposure.

[0071] The differential privacy parameter configuration module (355) provides a balance between privacy protection and analytical accuracy. It enables the system (120) to maintain compliance with privacy standards while supporting secure multiparty data collaboration in environments where sensitive information must remain confidential.

[0072] In further operation, the query-based risk assessment module (345) operatively coupled to the response module (340) and is configured to aggregate the one or more responses. Further, the query-based risk assessment module (345) quantifies the risk based on the plurality of data fields of the one or more data sources using a privacy threat modelling output integrated into a differential private budget calculation per structured query for effective risk quantification and mitigation.

[0073] The query-based risk assessment module (345) performs a comprehensive evaluation of the privacy implications associated with the responses generated by the system (120). It aggregates the outputs of executed queries and analyzes the underlying data fields to determine the cumulative risk exposure. This assessment leverages a privacy threat modeling framework that identifies potential vulnerabilities, such as re-identification risks or inference attacks, and correlates these threats with the sensitivity of the accessed data fields.

[0074] By integrating the threat modeling output into the differential privacy budget (365) calculation, the query-based risk assessment module (345) ensures that risk quantification is aligned with the privacy expenditure incurred for each structured query. This approach enables dynamic adjustment of privacy controls and facilitates proactive mitigation strategies, such as increasing noise levels or restricting further queries when risk thresholds are exceeded.

[0075] In one embodiment, the dashboard module (360) is configured to display the system (120) activities, the privacy budgets, history of the structured query and a privacy risk metrics (380) via the user interface (390), thereby enhancing transparency and control to the user. The dashboard module (360) serves as an interactive visualization layer that consolidates operational and analytical information into a unified view accessible to authorized users. The dashboard module (360) retrieves real-time data from various components of the system (120), including query execution logs, privacy budget (365) consumption records, and risk assessment outputs, and presents them in an intuitive format. This enables users to monitor ongoing activities, track remaining privacy budgets, and review historical queries for audit and compliance purposes. Additionally, the dashboard provides graphical representations of privacy risk metrics (380), allowing users to assess the sensitivity of executed queries and the effectiveness of applied privacy-preserving measures.

[0076] By offering comprehensive visibility and control, the dashboard module (360) empowers users and administrators to make informed decisions regarding query execution and privacy management. The integration of the dashboard module (360) within the user interface (390) ensures seamless accessibility, thereby reinforcing the system’s commitment to transparency, accountability, and secure multiparty data collaboration.

[0077] Consider a non-limiting example wherein a system (120) for multiparty data clean room with differentially private insight sharing begins with a user initiating a session through a user interface (390). The user provides one or more textual inputs representing a query, which are received by the query receiving module (320). The query generation module (325), operatively coupled to the query receiving module (320), converts these textual inputs into corresponding structured queries using parsing techniques or, in certain embodiments, a machine learning model for enhanced accuracy. Once the structured query is generated, the privacy budget (365) calculation module (330) computes a privacy budget (365) by aggregating analytical values (370) associated with the plurality of data fields relevant to the query. These analytical values (370) are derived from prior risk quantification, ensuring that the privacy expenditure reflects the sensitivity of the accessed data. The pending budget tracking module (335) continuously monitors the consumption of the privacy budget (365) during query execution, preventing any breach of predefined privacy constraints. The response module (340) retrieves the query results and applies differential privacy techniques, introducing calibrated noise to generate perturbed data (375) while preserving analytical utility. The differential privacy parameter configuration module (355) dynamically adjusts noise levels based on privacy parameters and remaining budget. The query-based risk assessment module (345) aggregates responses and evaluates cumulative risk using privacy threat modeling integrated with budget calculations, enabling proactive mitigation strategies. Finally, the dashboard module (360) provides a comprehensive view of system (120) activities, privacy budgets (365), query history, and risk metrics, ensuring transparency and control for users and administrators. The system (120) ensures secure, privacy-preserving collaboration across multiple parties while maintaining compliance with differential privacy principles.

[0078] FIG. 4 is a flow chart representing the steps involved in a method for multiparty data clean room with differentially private insight sharing, in accordance with an embodiment of the present disclosure.

[0079] The method (400) includes receiving one or more textual inputs as a query to a session from a user via a user interface in step (405). The user provides a natural language textual input which represents the query intended to retrieve insights from multiparty data sources within a privacy -preserving environment.

[0080] Upon submission, the system associates the query with an active session, ensuring that contextual information such as user identity, session metadata, and timestamps are maintained for traceability. This establishes the foundation for subsequent processing by capturing user intent in natural language form, thereby simplifying query initiation and enhancing usability in collaborative data analysis scenarios.

[0081] The method (400) further includes converting the one or more textual inputs into a corresponding structured query in step (410). Following the receipt of textual inputs, the next step includes converting these inputs into the corresponding structured query. This conversion is performed by applying parsing and interpretation techniques to extract semantic meaning from the natural language input. The machine learning model trained on historical query patterns and domainspecific vocabulary may be employed to improve accuracy and handle complex queries. The structured query adheres to the syntax and operational constraints of the underlying data sources, ensuring compatibility and efficient execution. This eliminates manual intervention, reduces errors, and enables automated transformation of user-friendly inputs into machine-readable formats suitable for multiparty data environments.

[0082] The method (400) further includes calculating a privacy budget associated with the structured query, wherein the privacy budget is calculated as an aggregated analytical value obtained based on a plurality of data fields of one or more data sources pertaining to the structured query in step (415). The privacy budget is computed as the aggregated analytical value derived from the plurality of data fields relevant to the query across one or more data sources.

[0083] These analytical values reflect the sensitivity of each data field based on factors such as likelihood of identification and impact of exposure. By summing these values, the system determines the cumulative privacy expenditure for executing the query. This calculation ensures that privacy constraints are enforced and prevents excessive disclosure of sensitive information, thereby maintaining compliance with differential privacy principles.

[0084] The method (400) further includes executing the one or more structured queries in step (420). Once the privacy budget is calculated, the method proceeds to execute the structured query within the constraints of the allocated budget. The execution involves retrieving data from validated sources while adhering to access permissions and privacy parameters defined by participating entities.

[0085] During this process, the system ensures that the query operates exclusively on authorized datasets and applies necessary safeguards to prevent unauthorized exposure. This controlled execution enables accurate data retrieval while maintaining strict privacy compliance, forming a critical step in delivering meaningful insights without compromising confidentiality. The method (400) further includes continuously monitoring the privacy budget to estimate consumption of the structured query in step (425). The monitoring is performed dynamically to update the remaining budget after each operation. By tracking incremental expenditure, the system prevents budget overruns and enforces privacy constraints in real time.

[0086] If the available budget falls below a predefined threshold, the system may restrict further queries or apply additional perturbation measures. This ensures transparency, accountability, and proactive privacy management throughout the query lifecycle.

[0087] The method (400) further includes providing one or more responses with a corresponding perturbed data based on the privacy budget thereby preventing extraction of private data in step (430). Upon successful execution, the one or more responses is provided to the user, which comprises the perturbed data generated based on the privacy budget.

[0088] Perturbation techniques, such as calibrated noise addition, are applied to the query output to reduce the risk of re-identification while preserving analytical utility. The level of noise is dynamically adjusted according to privacy parameters and residual budget, ensuring compliance with differential privacy principles. This prevents unauthorized inference of individual-level data and enables secure insight sharing across multiple parties.

[0089] The method (400) further includes aggregating the one or more responses, quantifying their risk based on the plurality of data fields of the one or more data sources using a privacy threat modelling output integrated into a differential private budget calculation per structured query for effective risk quantification and mitigation in step (435). Finally, the responses are aggregated and quantified with their cumulative risk using the privacy threat modelling framework integrated with differential privacy budget calculations. This assessment evaluates potential vulnerabilities such as inference attacks or reidentification risks and correlates them with the sensitivity of accessed data fields. Based on this analysis, the system may implement mitigation strategies, including adjusting privacy parameters or restricting further queries. This ensures robust risk management and reinforces the system’s ability to maintain compliance with privacy standards while supporting secure multiparty collaboration.

[0090] The present disclosure offers significant advantages in the domain of privacypreserving data collaboration. One of the primary benefits is enhanced privacy protection, achieved through the integration of differential privacy principles and dynamic noise adjustment mechanisms. This ensures that sensitive information remains secure while allowing meaningful insights to be shared among multiple parties. Another advantage lies in granular risk quantification, where analytical values are assigned to individual data fields and combined with privacy threat modeling to deliver precise risk assessments and effective mitigation strategies. The invention also facilitates controlled multiparty collaboration by implementing privacy budget calculation and continuous tracking, thereby enforcing strict privacy expenditure limits and preventing unauthorized data extraction. Additionally, the system (120) provides automated query processing, converting natural language inputs into structured queries using advanced parsing techniques or machine learning models, which reduces manual intervention and improves operational efficiency. Further, the system (120) enhances transparency and governance through an interactive dashboard that displays real-time system activities, privacy budgets, query history, and risk metrics, empowering users and administrators with comprehensive control and audit capabilities. Finally, the modular and scalable architecture of the system (120) ensures adaptability to diverse data environments and regulatory requirements, making it suitable for large-scale deployments in complex multiparty ecosystems.

[0091] It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the disclosure and are not intended to be restrictive thereof. While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended. As would be apparent to a person skilled in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein.

[0092] The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, the order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts need to be necessarily performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.

Claims

WE CLAIM:

1. A system for multiparty data clean room with differentially private insight sharing comprising: a processor; a memory coupled to the processor, wherein the memory comprises instructions that when executed by the processor cause the processor to: receive one or more textual inputs as a query to a session from a user via a user interface; convert the one or more textual inputs into a corresponding structured query; calculate a privacy budget associated with the structured query, wherein the privacy budget is calculated as an aggregated analytical value obtained based on a plurality of data fields of one or more data sources pertaining to the structured query; execute the one or more structured queries; continuously monitor the privacy budget to estimate consumption of the structured query; provide one or more responses with a corresponding perturbed data based on the privacy budget thereby preventing extraction of private data; and aggregate the one or more responses, quantify their risk based on the plurality of data fields of the one or more data sources using a privacy threat modelling output integrated into a differential private budget calculation per structured query for effective risk quantification and mitigation.

2. The system as claimed in claim 1, to cause the processor to validate a plurality of data fields stored in one or more databases hosted by multiple parties wherein the plurality of data fields is provided with access permissions.

3. The system as claimed in claim 1, wherein the one or more textual inputs are converted into the corresponding structured query using a machine learning model.

4. The system as claimed in claim 2, to cause the processor to associate an analytical value to the plurality of data fields based on a likelihood of identification of the user and an impact of exposure of the data fields, wherein the analytical value is one of a numerical and categorical score.

5. The system as claimed in claim 1, wherein the privacy budget associated with the user is incrementally consumed upon processing the structured query, resulting in a progressive reduction of the available privacy budget.

6. The system as claimed in claim 1, wherein the privacy budget is assigned by an admin of the respective parties along with a plurality of privacy parameters, wherein the plurality of privacy parameters indicates a privacy level conserved in each of the plurality of data fields.

7. The system as claimed in claim 1, to cause the processor to adjust an amount of noise of the plurality of data fields to generate the perturbed data thereby preserving a corresponding analytical value of the plurality of data fields.

8. The system as claimed in claim 1, to cause the processor to display the system activities, the privacy budgets, history of the structured query and a privacy risk metrics via a user interface, thereby enhancing transparency and control to the user.

9. A method for multiparty data clean room with differentially private insight sharing comprising: receiving one or more textual inputs as a query to a session from a user via a user interface; converting the one or more textual inputs into a corresponding structured query; calculating a privacy budget associated with the structured query, wherein the privacy budget is calculated as an aggregated analytical value obtained based on a plurality of data fields of one or more data sources pertaining to the structured query; executing the one or more structured queries; continuously monitoring the privacy budget to estimate consumption of the structured query; providing one or more responses with a corresponding perturbed data based on the privacy budget thereby preventing extraction of private data; and aggregating the one or more responses, quantifying their risk based on the plurality of data fields of the one or more data sources using a privacy threat modelling output integrated into a differential private budget calculation per structured query for effective risk quantification and mitigation.