The invention discloses a high-risk 
pollution source classification forecasting method based on 
principal component analysis and 
random forest. The method includes the steps of collecting and integrating environmental 
pollution source behavior data of enterprises into primary selection indexes, and screening out illegal 
pollution source behavior indexes influencing pollution sources to serve as a high-risk pollution source 
index system; conducting data cleaning and data normalization 
processing on the environmental pollution source behavior data; finding out a function relationship indicating whether or not the high-risk pollution source 
index system and the pollution sources are illegal, and building a 
random forest model; conducting model training and evaluating the precision of the 
random forest model after training is finished; sorting importance degrees of the pollution source behavior indexes; conducting the 
principal component analysis to obtain principal components, utilizing the principal components to conduct weighting and work out comprehensive scores; according to the comprehensive scores, judging the risk 
score coefficient of each enterprise, automatically 
ranking the risk core coefficients and generating a TOP enterprise 
list, wherein the risk 
score coefficients indicate the 
occurrence probability of illegal behaviors of the corresponding enterprises. The high-risk pollution source classification forecasting method based on the 
principal component analysis and the random forest can reduce complexity of operations and improve forecasting precision and the 
quality of results.