The invention relates to a method for memory estimation and configuration optimization in a distributed data processing system. The method at least comprises the steps that a data program stream analyzed and processed for a conditional branch and / or a loop body of a program code in an application jar package is matched with a data feature library, based on a result of successful matching, the memory upper limit of at least one stage is estimated, based on the memory upper limit, configuration parameter optimization is carried out on an application program, based on the running processes of the optimized application program, the static feature and / or dynamic feature of program data are collected, and persistence recording is carried out. According to the method for the memory estimation and configuration optimization in the distributed data processing system, different from a black box model of memory estimation by machine leaning, the accuracy of the result of machine leaning prediction is not necessary high, and fine-grained prediction at each stage is made difficultly; program analysis and existing data features are used to accurately estimate the overall memory footprint, thememory use situation of job at each stage is estimated according to the program analysis, and the further fine-grained configuration optimization is made.