The invention provides a method for automatically obtaining short text of the knowledge domain from a
community question-and-answer website.Question-and-answer web pages and author web pages of each subject of the domain corresponding to the knowledge domain can be crawled from the
community question-and-answer website, a
system with comprehensive data is obtained, and learning and using of a user are convenient.The method comprises the following steps that 1, a
Web page of the knowledge domain in the
community question-and-answer website is crawled; 2, short text, with concentrated
web page data, of the knowledge domain is extracted; 3,
a domain subject tree is constructed; 4, storing of the domain subject tree is conducted.By means of the method, the short text of the knowledge domain can be automatically extracted from semi-structured data of the community question-and-answer website, the question-and-answer web pages and the author web pages of each subject of the domain corresponding to the knowledge domain are crawled from the community question-and-answer website, a
web page data set of the knowledge domain is constructed, the short text of the knowledge domain is automatically extracted from the
web page data set, and parent child relationships are found, so that the domain subject tree is constructed, storing of the domain subject tree is achieved, and learning and using of the user are convenient.