Provided is a text categorization method based on an Xgboost categorization algorithm. According to the text categorization method, a characteristic value is calculated by extracting a tagged word through Labeled-LDA, and then text categorization is conducted by using the Xgboost categorization algorithm. Compared with a method that the text categorization is conducted by using a common categorization algorithm and a common vector space modal is adopted as characteristic space, the method reduces required consumed memory, this is because the number of words contained in a Chinese text is several million, dimensionality is high, if the words are adopted as characteristics, the consumed memory is massive, even one machine cannot conduct processing, however, the number of common Chinese characters is no more than ten thousand, the number of frequent Chinese characters is even two to three thousand, the dimensionality is reduced greatly, and meanwhile Xgboost supports input in a dictionary mode rather than an array mode. Besides, the invention provides a novel feature selection algorithm Labeled-LDA algorithm with latent semantic and supervision, the Labeled-LDA is adopted to conduct feature selection, and thus not only can semantic information of massive linguistic data be dug by utilizing LDA, but also class information contained in the text can be utilized. Furthermore, preprocessing is easy, there is no need to extract the characteristics carefully, and accuracy and performance of categorization are improved with the addition of the strong ensemble learning algorithm Xgboost supporting a distributed mode.