The invention discloses a method for extracting information from an academic 
home page. The method comprises the following steps of: (1) finding an academic 
home page from Internet; (2) 
crawling and analyzing the academic 
home page, wherein the 
crawling of an irrelevant page is reduced by using a 
heuristic strategy so as to accelerate analysis speed; (3) analyzing the page into a form of documentobject module (DOM), and dividing according to attributes and contents of elements so as to acquire a cohesive text unit 
list; (4) identifying the text unit by using an information recognizer, wherein each information recognizer only identifies one 
information type, and performing subfield extraction on the text information; (5) performing association analysis on the extraction result, eliminating different meanings by using the association of the information, and complementing the missing field; and (6) matching the extraction result and a 
database, and eliminating the redundant data, wherein the extraction result is stored in a semantic 
database in a form of semantic data. In the method, by combination of 
heuristic rules, a 
machine learning method and a 
conditional probability model, academic information can be extracted efficiently and accurately from the academic home page.