Short text topic model mining method based on word network to extend characteristics

A topic model, short text technology, applied in the field of short text feature expansion, can solve the problems of short text data sparse, model quality dependent on expansion strategy, and the impact of expansion results.

Active Publication Date: 2016-10-26
NANJING UNIV
View PDF9 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Another way to expand short texts is to use search engines or wiki databases to find words or sentences related to the current text in search engines or wiki databases to expand the original text. This method can to a certain exten

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text topic model mining method based on word network to extend characteristics
  • Short text topic model mining method based on word network to extend characteristics
  • Short text topic model mining method based on word network to extend characteristics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In order to better understand the technical content of the present invention, specific embodiments are given together with the attached drawings for description as follows.

[0054] Such as figure 1 As shown, the present invention will set up the weighted word network diagram according to the training corpus before implementation,

[0055] Step 0 is to establish the initial state of the network graph of weighted words.

[0056] Step 1 is to use an open source word segmentation tool to perform Chinese word segmentation on the documents in the corpus, and convert each document into a collection of words.

[0057]Step 2 is the operation of removing stop words for word segmentation. Since stop words have no meaning for topic modeling, after the word segmentation is completed, stop words in the word set are removed against the stop word vocabulary.

[0058] Step 3 is to establish nodes in the weighted word network, and each word after step 2 to stop word processing is used...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A short text topic model mining method based on a word network to extend characteristics comprises a weighted word network construction step, a short text characteristics extending step, and a topic mining step. The weighted word network construction step comprises preprocessing a text, performing Chinese words segmentation on the text in a short text corpus, and deleting stop words; establishing a weighted word network from a document after the Chinese words segmentation is performed, wherein nodes in the weighted word network are words, each edge between the nodes is cooccurrence relation of two words in the same document, and the weight of the edge is the cooccurrence time of the two words in the whole corpus; and ending. The short text characteristics extending step comprises using the word nodes included by each short text after the Chinese words segmentation is performed as a community of the established weighted word network. According to the short text characteristics sparsity solution method based on word network community module degree, the problem that the effect of applying an LDA topic model to the short text is poor is solved. Accuracy of a short text topic model is increased.

Description

technical field [0001] The invention relates to the field of short text text topic models and complex network analysis, and is a method for expanding short text features by using weighted word networks, so as to solve the feature sparsity problem existing in the application of LDA topic models to short texts. Background technique [0002] The features of short texts are very sparse, resulting in the poor performance of generative models such as LDA for topic modeling based on word co-occurrence relations on short texts. In the current environment, short texts have incomparable advantages over long texts. Compared with long texts, short texts express concise semantics and transmit information quickly. People are more and more inclined to use short texts to transmit information. , short texts are becoming one of the most important information carriers in today's society, such as online advertisements, short messages, and popular social media such as Weibo, Twitter, etc. These ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/335G06F16/35
Inventor 张雷戴恒宇蔡洋王陆霞陆恒杨徐鸣王崇骏
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products