Microblog topic clustering method based on word vector and singe-pass fusion
A clustering method and word vector technology, applied in the field of microblog topic clustering, can solve the problems of large dimensionality, high computing overhead, and multiple data dimensions, and achieve the effect of improving the effect and reducing the efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0049] The LDA topic model is more sensitive to topic discovery when targeting longer texts such as news reports, and the topic discovery effect is better, but for short microblog texts, because the number of words is short, it contains more irrelevant information such as noise, and the number of feature words Therefore, this paper improves the single-pass algorithm, uses the improved single-pass algorithm to cluster the topic clusters of the microblog texts, and finally uses the LDA topic model to discover the topics of the same cluster of texts . The implementation process is as follows: preprocessing the acquired microblog data and constructing a vocabulary database; performing Word2vec word vector mapping on feature words; clustering microblog texts using single-pass fused with Word2vec word vectors; using LDA topic model to Clustering into topic discovery.
[0050] 1. Filter noise data
[0051] Data noise mainly includes advertisements, emoticons, special characters, pi...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com