Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Topic model based document keyword extraction method and system

A topic model and keyword technology, which is applied in the topic model-based document keyword extraction method and its system field, can solve problems such as high cost, difficult commercial use, and failure to provide users, and achieve the effect of improving the effect

Active Publication Date: 2016-08-10
SOUTH CHINA UNIV OF TECH
View PDF3 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The method of labeling by users spontaneously is applicable to few scenarios. Users only label specific objects of interest. Currently, it is impossible to provide an effective model to encourage users to label other content
Due to the current rapid development of information technology, the amount of information on the Internet is also increasing explosively, and new content is generated all the time. It is too expensive to ask experts to manually mark relevant documents, and the marked documents can only be used for research, and it is difficult to for commercial use

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic model based document keyword extraction method and system
  • Topic model based document keyword extraction method and system
  • Topic model based document keyword extraction method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0076] Such as figure 1 As shown, the overall flowchart of the document keyword extraction method based on the topic model, the document keyword extraction method of the topic model includes the following steps:

[0077] Document information preprocessing: divide the input document into parts of speech, remove function words / stop words, extract stems, and create semi-structured data.

[0078] Document structure graph construction: The document structure graph describes the position information of each word in the document. Each node of the graph represents a word, and an edge connecting two nodes indicates that the words represented by these two nodes are closer in the document. The invention proposes a method for constructing a document structure diagram.

[0079] Document topic distribution extraction: Each document has a topic that emphasizes description. This method uses topic model technology to extract the topic distribution in the document and the topic distribution o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a topic model based document keyword extraction method and system. The document keyword extraction method comprises the following steps of document information preprocessing, document structure graph construction, document topic distribution extraction, word weight extraction and keyword generation. The document keyword extraction system comprises the following modules: a document information preprocessing module, a document structure graph construction module, a document topic distribution extraction module, a word weight extraction module and a keyword generation module. According to the method and system, extracted keywords are more reasonable and related to a topic of a document more closely; and partial deficiencies in the keyword extraction field at present are overcome, a better document summarization effect is achieved, and a user can conveniently and quickly know an abstract of the document.

Description

technical field [0001] The invention relates to a data mining technology, in particular to a method and system for extracting document keywords based on a topic model. Background technique [0002] Keywords are a summary of the main content of the document, and it is an important way to quickly understand the subject of the document. Keywords can be seen in various places. For example, we can see the tags of each news article on news websites, and we can see the keywords discussed in the paper when we browse scientific and technological papers. It reduces the difficulty for people to search for information in massive amounts of information. Current keywords have been applied in various fields. In the field of information retrieval, keywords are widely used. Search engine companies such as Baidu and Google search based on keywords in webpage text, and the results based on keywords in documents are often what users want. In the field of social networks, many current functio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/35G06F40/289
Inventor 蔡毅杨楷闵华清
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products