Unlock instant, AI-driven research and patent intelligence for your innovation.

Social media text denoising method based on space-time burst features

A social media and text technology, applied in the field of text denoising technology, can solve problems such as difficult to accurately extract text classification features, irregular expressions, and difficulty in obtaining quantitative training sets, so as to reduce sensitivity, improve accuracy, and be lucid The effect of stickiness and ease of use

Active Publication Date: 2021-12-21
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The effect of these methods is very dependent on the quantity and quality of the training set, and the scalability is also poor (it is difficult to extend the classification of one task to other tasks)
In practice, the labeling of training sets generally needs to be done manually, so it is very difficult to obtain a large number of high-quality training sets; at the same time, due to the characteristics of social media text data such as strong sparsity and irregular expressions, many texts It is difficult to extract the classification features accurately, which leads to the poor classification effect of the existing methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Social media text denoising method based on space-time burst features
  • Social media text denoising method based on space-time burst features
  • Social media text denoising method based on space-time burst features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention proposes a social media text denoising method based on spatio-temporal burst features. This method aims at the characteristics of time and space aggregation of value information (texts related to events, topics, etc.) in social media, and models the spatiotemporal distribution of each word in the text from the perspective of spatiotemporal burstiness, thus from Identify words related to events and topics in massive social media data. Distinguishing value text from noise text according to whether the words in the text have spatiotemporal aggregation can improve the ease of use and effectiveness of social media text denoising methods.

[0041] This method focuses on judging whether the words in the text in the current time window are clustered in terms of time and space, and the words with clustering are identified as value words. If a text does not contain any value words, it is determined that the text is a noise text and removed directly. Theref...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a social media text denoising method based on space-time burst features, belongs to the field of data processing, and aims to solve the problem of poor text classification effect in the prior art. An improved Ripley's K function is used for measuring whether time-space points where words appear have aggregation or not, noise words and event words are distinguished, meanwhile, in order to reduce the influence of a Ripley's K function threshold value l of the words on a result and reduce word misjudgment, a graph regularization algorithm is introduced, and accuracy of word validity judgment is improved by fusing relevance information between the words.

Description

technical field [0001] The invention belongs to the field of data processing, in particular to a text denoising technology. Background technique [0002] With the widespread use of social media, tens of billions of text messages are published on Twitter, Facebook, Instagram, Weibo and other social media every day, and these messages contain the most complete and most time-sensitive types of information. By extracting and analyzing this information, we can accomplish many valuable things. [0003] Text is one of the important forms for users to express content in social media, so social media text data contains a lot of valuable information, and these data are also the input of many social media data mining tasks. However, due to the openness of social media, most of the text information in social media is a description of personal life and personal emotions, and these texts usually do not contain valuable information. [0004] Social media text denoising aims to identify a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/216G06F40/284G06F40/295
CPCG06F40/216G06F40/284G06F40/295
Inventor 费高雷程勇胡光岷
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA