Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Topic Model Construction Method Based on Community Discovery

A topic model and community discovery technology, applied in the field of topic mining of social short text data, can solve problems such as data sparsity, and achieve the effect of avoiding the influence of topic models

Active Publication Date: 2020-06-26
NANJING UNIV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The TMCD model starts from the perspective of the inherent community relationship in the data, and uses the community discovery algorithm as the basis to carry out short text self-expansion, which solves the problem of data sparsity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Topic Model Construction Method Based on Community Discovery
  • A Topic Model Construction Method Based on Community Discovery
  • A Topic Model Construction Method Based on Community Discovery

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to better understand the technical content of the present invention, specific embodiments are given together with the attached drawings for description as follows.

[0026] like figure 1 Shown is the relationship between documents, topics, and vocabulary in the topic model. After the concept of "topic" is introduced into the data, the topic can be used as a "bridge" to connect documents and words. By observing the probability distribution between documents and topics and the probability distribution between topics and words, it can be obtained through related mathematical models. The distribution of subjects. When obtaining the relationship between topics and words, the degree of word co-occurrence relationship will affect the accuracy of the observation results, and this accuracy will further affect the quality of the final topic model. For long texts, there are enough word co-occurrence relationships as support during observation, while short texts lack suf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a technical scheme of a community discovery-based topic model building method. The method sequentially comprises the following steps of extracting a contained relational network based on short text data; dividing the relational network into a plurality of communities by adopting a community discovery algorithm; performing expansion on a short text extracted from each community to obtain long documents with a word co-occurrence relation, and forming a long document set by the obtained long documents; and performing topic mining for the long document set to obtain a community discovery-based TMCD topic model. According to the method, self-expansion of the short text is performed based on the community discovery algorithm from the perspective of a community relation contained in the data, so that the problem of data sparsity is solved.

Description

technical field [0001] The invention relates to a method for constructing a topic model based on community discovery, in particular to a technology for topic mining of social short text data containing social networks. Background technique [0002] In the current network environment, with the enrichment of various online platforms, a large amount of social data is generated, and social networks have become a source of data for information mining. Most of the data generated in this scenario is presented in the form of short text. Compared with long texts, short texts express concise semantics and transmit information quickly, which is an obvious development trend of information dissemination. Short texts are becoming one of the most important information carriers in today's society. [0003] At present, in the analysis methods of these data, it is a very effective way to mine the semantic information of the text connotation through the topic model. Classical topic model al...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/30G06Q50/00
CPCG06F16/35G06F40/30G06F2216/03G06Q50/01
Inventor 张雷赵鑫宋岳李宁
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products