Unlock instant, AI-driven research and patent intelligence for your innovation.

Topic-based online clustering method for short text in social media

A technology of social media and clustering methods, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as limited applicability, lack of strict proof of auxiliary distribution, etc., to improve prediction accuracy and speed up convergence speed , to make up for the lack of performance

Inactive Publication Date: 2018-12-25
UNIV OF ELECTRONIC SCI & TECH OF CHINA
View PDF6 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The auxiliary distribution constructed by the DEC method lacks a strict proof and has limited applicability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic-based online clustering method for short text in social media
  • Topic-based online clustering method for short text in social media
  • Topic-based online clustering method for short text in social media

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to facilitate those skilled in the art to understand the technical content of the present invention, the content of the present invention will be further explained below in conjunction with the accompanying drawings.

[0030] Such as figure 1 Shown is the scheme flowchart of the present invention, the technical scheme of the present invention is: the social media short text online clustering method based on theme, comprises the following steps:

[0031] S1, preprocessing the input training short text, including word segmentation, removing stop words, part-of-speech recognition, and named entity recognition; the input of the topic-based social media short text online clustering method is basically the same as the input of the text clustering method in the prior art Consistent, including the original text in the form of a string and the unique identification ID of the text. The unique identification ID is only for saving storage space in the subsequent steps, and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a topic-based online clustering method for short text of social media. The invention adopts a conservative pre-clustering method to pre-aggregate the text into a long text, which enhances the co-occurrence relationship between words, and extracts a clearer topic and a higher degree of discrimination. Moreover, the Bayesian inference method based on smoothing and normalization has the function of distinguishing new topics. The online incremental clustering method based on this method is more efficient than the non-incremental clustering method. Compared with the traditional online incremental clustering method, the accuracy is higher, and the number of topics is closer to the real value.

Description

technical field [0001] The invention belongs to the field of semantic analysis of social media, and in particular relates to a text clustering technology. Background technique [0002] With the rapid development of network technology and mobile Internet, the amount of global data has shown explosive growth, and message interaction has become more efficient than ever. Search engines are no longer the number one source of traffic on the Internet, but are replaced by social media. For users, social media is only a tool for exploring the world and sharing themselves, but the huge number of users and the spontaneous dissemination of information make the potential value of social media far exceed the initial positioning of the product. [0003] As one of the research directions of big data analysis, semantic analysis of social media is an emerging discipline in recent years, which involves social network analysis, machine learning, data mining, information retrieval and natural la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F40/30
Inventor 费高雷蒋勇许舟军胡光岷
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA