Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for detecting microblog topics

A topic detection and microblog technology, which is applied to network data retrieval, other database retrieval, unstructured text data retrieval, etc., can solve the problems of reducing the accuracy of microblog topics, achieve the effect of improving accuracy and avoiding information sparseness

Inactive Publication Date: 2014-05-21
GUANGXI UNIVERSITY OF TECHNOLOGY
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a diving observation level ROV device, which aims to solve the problem that a large number of sparse matrices and some commonly used symbols and network vocabulary in microblogs will greatly reduce the accuracy of microblog topic detection in the process of microblog matrix question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting microblog topics
  • Method for detecting microblog topics
  • Method for detecting microblog topics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The present invention is achieved like this, in conjunction with the attached figure 1 , a microblog topic detection method, the detection method is implemented as follows:

[0021] Step 1: Select a collection of microblogs, and use the online thesaurus of "Internet Word Network" to scan the collection of microblogs for preprocessing. The preprocessing is mainly to map symbolized and sloganized words into commonly used words;

[0022] Step 2: After preprocessing, use the ICTCLAS word segmentation system of the Chinese Academy of Sciences to perform word segmentation and part-of-speech tagging on the microblog collection to be processed. After word segmentation and part-of-speech tagging, select those part-of-speech words such as nouns, verbs, and adjectives, and remove quantifiers, Function words and other words to improve the efficiency and accuracy of processing;

[0023] Step 3: Use the HOWNET tool to acquire and expand the concept of microblog words;

[0024] Step...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for detecting microblog topics. Microblog sets are selected, and the microblog sets are preprocessed through network word network word bank scanning; after the microblog sets are preprocessed, word segmentation, part-of-speech tagging and the like are conducted on the microblog sets to be processed through an ICTCLAS; microblog word concepts are acquired and expanded through an HOWNET tool; the importance degree of the concepts is calculated through TFIDF, a concept vector space model is built for each post, and microblog posts are gathered to form a post matrix model; microblogs are clustered through a clustering algorithm, and the clustered microblog sets are topic sets. Word segmentation, part-of-speech tagging and the like are conducted on the microblog sets to be processed through the ICTCLAS, and therefore topic detection time in the later period is prolonged. The HOWNET is used as the tool, synonyms and word related attributes are used as extension to increase the amount of information, the problem of information sparsity is greatly avoided, and topic detection accuracy in the later period is greatly improved.

Description

technical field [0001] The invention belongs to the field of topic detection, in particular to a microblog topic detection method. Background technique [0002] At present, the development of topic detection technology is relatively mature, but Weibo is a social method that only emerged around 2010. The biggest difference between Weibo and ordinary blogs is that the text of Weibo is limited to 140 characters. It presents the characteristics of individuation, symbolization, sloganization and non-standardization. [0003] At present, some related methods have begun to detect some microblog topics, but most of them are limited to 140 words in microblogs, so there will be a large number of sparse matrix problems in the process of making microblog matrices, and some symbols and networks commonly used in microblogs Vocabulary will also greatly reduce the accuracy of microblog topic detection. Contents of the invention [0004] The purpose of the present invention is to provide...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/951
Inventor 王萌黄镇谨欧阳浩
Owner GUANGXI UNIVERSITY OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products