Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method of filtering garbage users and extracting short text topics

A short text, user-friendly technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as time-consuming, meaningless garbage topics, lack of mathematical statistics foundation, etc., to achieve high efficiency and clear and easy-to-understand models Effect

Inactive Publication Date: 2019-01-29
SUN YAT SEN UNIV
View PDF1 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Because there are some users in the microblog data who post a large number of similar microblogs in a short period of time, these users are called spam users, and the topics detected by the model are all meaningless spam topics; the method of extracting topics in the model is The LSA (Latent Semantic Analysis) method based on SVD (Singular Value Decomposition, Singular Value Decomposition) extracts relevant emergent topics, but this method lacks a rigorous mathematical and statistical foundation, and SVD decomposition is particularly time-consuming

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method of filtering garbage users and extracting short text topics
  • A method of filtering garbage users and extracting short text topics
  • A method of filtering garbage users and extracting short text topics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;

[0053] In order to better illustrate this embodiment, some parts in the drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product;

[0054] For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.

[0055] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0056] Such as figure 1 As shown, a method for filtering spam users and extracting short text topics includes the following steps:

[0057] S1: Filter spam users on the microblog data stream, and filter out users who post a large number of similar microblogs in a short period of time and their microblog data;

[0058] S2: On the data processed in step S1, calculate the burst value of microbl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for filtering garbage users and extracting short text topics. The method carries out garbage user filtering treatment on the original data, largely avoiding the problemthat the detected sudden topic is a topic with no practical significance. Using BTM (Biterm Topic Model) topic model to extract topic, the model is clear and easy to understand, and the efficiency oftopic extraction is high.

Description

technical field [0001] The invention relates to the field of text processing algorithms, more specifically, to a method for filtering junk users and extracting short text topics. Background technique [0002] Emergent topic detection is a popular research direction in the field of network information processing and natural language processing in recent years. Emergent topics are also called emerging topics or trending topics, which generally refer to a class of topics that are about to explode or spread widely. Usually it will be accompanied by major news and hot events. Moreover, the Weibo social networking platform is an open platform, on which information dissemination speed is fast and the dissemination range is wide, and sudden topics may have a significant social impact. Therefore, taking Weibo text as the research object, it is of great theoretical and practical significance to detect topics or events in the data before they evolve into hot events or in the early sta...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/22
CPCG06F40/194G06F40/289
Inventor 戴小款
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products