The invention relates to a topic
phrase extraction method. The topic
phrase extraction method includes preprocessing documents, seeking a document-topic set, a full text lexical chain set and a
noun phrase set, seeking a central word set, seeking a candidate topic phrase set, and seeking a topic phrase set. The topic
phrase extraction method has the advantages that topic phrases are extracted through combination between an LDA (
latent Dirichlet allocation) model and a lexical chain, a
knowledge base WordNet with complete
semantic information outside a corpus can be utilized, a strong lexical chain can be acquired through
semantic relevance calculation and strong chain rule
filtration, and accordingly, the
ambiguity of topic words is reduced greatly; the topic phrases are extracted according a central word extraction method and by N-P rule combination and deduplication steps, and topics are expressed by the topic phrases with rich
semantic information, so that the problems such as low
granularity and recognition degree of the topic words are solved, topic extraction accuracy and
recall rate can be guaranteed, topic drifting is reduced, and needs of practical applications can be wellmet.