Spark workflow scheduling method and system with privacy protection
A technology of privacy protection and scheduling method, which is applied in the field of Spark workflow scheduling method and system with privacy protection, which can solve the problems of increased computing overhead, data privacy and security cannot be guaranteed, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0018] A Spark workflow scheduling method with privacy protection, including: judging and marking input data according to privacy rules, marking the input data conforming to the privacy rules as private data, and the rest of the data as common data; The data is marked for privacy in units of partitions. The partitions containing private data are marked as private partitions, and the rest are common partitions; common partitions and Spark-ready tasks that need to use common partitions as input are scheduled to common data centers in the Spark cluster. Process on the node to obtain the first output data; schedule the privacy partition and the Spark ready task that needs to use the privacy partition as input to the node of the designated privacy data center in the Spark cluster for processing to obtain the second output data; judge the first output data And whether the second output data is the final result or an intermediate result, if it is the final result, the corresponding wo...
Embodiment 2
[0035] Based on the privacy-protected Spark workflow scheduling method described in Embodiment 1, this embodiment provides a privacy-protected Spark workflow scheduling system, including:
[0036] The first module is used to judge and mark the input data according to the privacy rules, mark the input data conforming to the privacy rules as private data, and the rest of the data as ordinary data;
[0037] The second module is used to mark privacy data and common data in units of partitions, the partitions containing private data are marked as private partitions, and the rest of the partitions are common partitions;
[0038] The third module is used to schedule common partitions and Spark-ready tasks that need to use common partitions as input to the nodes of the common data center in the Spark cluster for processing to obtain the first output data; The Spark ready task is scheduled to be processed on the node of the designated privacy data center in the Spark cluster, and the s...
Embodiment 3
[0041] Based on the privacy-protected Spark workflow scheduling method described in Embodiment 1, this embodiment provides a non-transitory computer-readable storage medium on which a computer program is stored. When the program is executed by a computer, the implementation The method described in Example 1.
[0042] Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


