Supercharge Your Innovation With Domain-Expert AI Agents!

A non-word-segmented burst topic detection method for microblog

A topic detection, non-word segmentation technology, applied in unstructured text data retrieval, text database clustering/classification, special data processing applications, etc., can solve problems such as difficulty in detecting emergent topics, and achieve the effect of improving overall performance

Active Publication Date: 2018-03-13
HARBIN ENG UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The Weibo network contains a large number of social and colloquial words, and it is difficult to detect sudden topics induced by new words or strings based on word segmentation methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A non-word-segmented burst topic detection method for microblog
  • A non-word-segmented burst topic detection method for microblog
  • A non-word-segmented burst topic detection method for microblog

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The existing burst topic detection methods based on Chinese word segmentation are all based on word frequency information of feature words. For Chinese microblogging, it is first necessary to perform Chinese word segmentation, construct the feature trajectory of the feature words, calculate the burst feature words according to a certain burst detection algorithm, and then use the set of highly relevant feature words to represent the burst topics.

[0042] For Chinese microblogging, this approach has certain flaws. Due to the diversity of Weibo users, Weibo terminology is flexible and non-standard, such as diaosi, Bogu Kailai, China on the tip of the tongue, Tangshan earthquake and other words or strings. There are a large number of sudden topics induced by new words or strings in Weibo, but these new words or meaningful strings cannot be divided according to the Chinese word segmentation dictionary, so that it is impossible to accurately find sudden topics in Weibo.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a non-word segmentation emergent topic detection method for a microblog, which utilizes a computer technology to assist the intelligent analysis or public opinion of network information. The method comprises the following steps of pretreating corpus, and establishing a dynamic microblog detection window; segmenting the content of the microblog into single Chinese characters, and establishing a dictionary; calculating a collection of emergent feature characters; calculating an emergent topic consisting of feature characters; generating meaningful words or strings, and forming the emergent topic formed by the words or strings. The detection method has the advantages that the Chinese word segmentation is not needed for the emergent topic of the microblog, the content of the Chinese microblog information does not need to be subjected to Chinese word segmentation in advance, and the Chinese characters, English words, pictures, videos, external links and the like are respectively used as single entities; finally, the Chinese characters in the entity of the emergent feature form the words, so the integral property is improved, and the recall rate of new words and colloquial strings is improved.

Description

technical field [0001] The invention relates to a microblog-oriented non-word-segmented burst topic detection method using computer technology to assist intelligent analysis of network information or public opinion. Background technique [0002] With the rise of the mobile Internet, microblogs such as Sina and Tencent have risen rapidly in my country. The number of registered users of Sina Weibo has exceeded 200 million, and the number of registered users of Tencent Weibo has exceeded 160 million. The number of microblog messages generated in my country's microblog network exceeds 300 million every day, and the microblog platform has become one of the main channels for people to obtain news and information in their daily lives. Due to the push mechanism of Weibo, the news of Weibo spreads rapidly in the network, which produces a huge spread influence. [0003] While Weibo provides people with information, it also increases the difficulty of social management. Events in th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/335G06F16/35G06F16/90344G06F16/9535
Inventor 杨武申国伟王巍苘大鹏玄世昌
Owner HARBIN ENG UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More