Time window based LDA microblog topic trend detection method and apparatus
A time window and detection method technology, applied in special data processing applications, instruments, electrical digital data processing and other directions, can solve the problems of information dispersion, lack of hot events, unfavorable hot topic modeling and analysis, etc., to achieve great practical characteristics, The effect of topic accuracy improvement
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0052] A time-window-based LDA microblog topic trend detection method, see figure 1 , the detection method includes the following steps:
[0053] 101: Obtaining microblog data sets through web crawlers;
[0054] For example: build a crawler program for Sina Weibo, crawl Weibo content for a certain period of time on Sina Weibo, and retain information such as publishing time, publishing author, title, and text content. This step is well known to those skilled in the art, and will not be described in detail in this embodiment of the present invention.
[0055] 102: Carry out preprocessing such as text segmentation and removal of stop words on the microblog data set to obtain a word set;
[0056] This step is specifically: use the existing Chinese lexical analysis system to segment the obtained microblog data set; then use the "HIT stop word list" to remove stop words and filter, and only keep the nouns and verbs in the word segmentation results . The embodiment of the present i...
Embodiment 2
[0067] The following is combined with specific calculation formulas, examples, and attached figure 1 The scheme in Example 1 is described in detail, see the following description for details:
[0068] 201: Construct a crawler program for Sina Weibo, crawl Weibo content for a certain period of time on Sina Weibo, and retain information such as publishing time, publishing author, title, text content, etc.;
[0069] 202: Use the Chinese Lexical Analysis System ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) developed by the Institute of Computing Technology, Chinese Academy of Sciences, use the provided API interface ICTCLAS5.0 for word segmentation processing, and use some special words, such as: emotional words, network words, etc. Add it to the tokenizer as a user dictionary to get a better word segmentation effect.
[0070] 203: Perform stop word screening on word segmentation results;
[0071] That is, remove words with no real meaning and high...
Embodiment 3
[0097] Attached below figure 2 and 3 , concrete example, carry out feasibility verification to the scheme in embodiment 1 and 2, see the following description for details:
[0098] Using web crawlers to collect Weibo content published on Sina Weibo from September to October 2011, a total of 25,495 pieces, retaining information such as publication time, publication author, title, text content, etc., and performing preprocessing such as Chinese word segmentation and stop words; Afterwards, the global time is divided into 4 time windows, as shown in Table 1, a total of 150 topics are extracted using the LDA topic model in each time window; after the similarity calculation of the topic results, K-means clustering is performed, and If the number of clusters is set to 2, the clustering result will be 2 hot topics. After returning to the document data, it is determined that topic 1 is the "child trafficking" incident, and topic 2 is the "Tiangong-1" incident. For example figure 2...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com