The invention discloses a large-scale data flow analysis method and system based on Spark, and the method comprises the steps: constructing a distributed calculation framework based on Spark, and carrying out the distributed data flow analysis of a target code through the distributed calculation framework, wherein the distributed data flow analysis comprises an intra-process preprocessing stage and a distributed inter-process data integration stage. The system comprises a data acquisition module, a data processing module, a data analysis module, a data storage module and a data display module. According to the method and system, a two-stage parallel strategy is adopted, parallel computing is carried out in a multi-entry mode, high expandability is achieved, meanwhile, the load balancing problem of distributed computing is considered, the computing power advantage of a distributed cluster can be fully utilized, and static data flow analysis is accelerated.