Social-media short text filtering method based on structure and text information

A text information and social media technology, applied in special data processing applications, instruments, electrical and digital data processing, etc., can solve problems such as lack of grammatical structure, overcome strong sparsity and dimensional disasters, reduce computational complexity, and efficiently process Effect

Inactive Publication Date: 2018-01-09
UNIV OF ELECTRONIC SCI & TECH OF CHINA
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, the messages posted by individual users on social networks to record personal status not only contain a large number of abbreviations, slang and even misspelle

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Social-media short text filtering method based on structure and text information
  • Social-media short text filtering method based on structure and text information
  • Social-media short text filtering method based on structure and text information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The technical solution of the present invention will be further described below in conjunction with the accompanying drawings.

[0039]The present invention regards information describing news or emergencies (such as politics, economy, military affairs, natural disasters, terrorist attacks, etc.) Views, etc.) as spam, take this as an example to illustrate the detailed technical solution for social network spam filtering. like figure 1 As shown, the social media short text filtering method based on structure and text information includes the following steps:

[0040] S1. Obtain the structural features of the short text by scanning the word segmentation set of the text, judge the structural features of the short text, and delete spam; because the short text messages in social networks are very irregular in format, they cannot accurately describe useful information Therefore, spam can be identified from the perspective of text structure. Specifically, it includes the fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a social-media short text filtering method based on structure and text information. The method includes the following steps that 1, the structural characteristics of a short text are judged, and junk information is deleted; 2, the core of the text is extracted, a judge structure judges whether a retained segment text contains the core information of a described event, if nocore information exists, the information is determined as junk information, and if the core information exists, core components are extracted; 3, textual features are extracted, and the core components of the text obtained in the step 2 are mapped to a characteristic space. By scanning a participle set of the text, such structural characteristics whether junk information exists or not can be judged, and mass data in the social network is thus easily and efficiently processed; by identifying characteristics of words, sentence patterns and the like, the feature selection purpose can be achieved, based on the method in which word2vec word vectors are added so as to obtain the average, a sentence vector is constructed, the calculation amount of a classifier model in the training process is reduced, and the semantic information of the text can be well represented.

Description

technical field [0001] The invention relates to a method for filtering short texts in social media based on structure and text information. Background technique [0002] With the development of technologies such as Web2.0, social media, and mobile Internet, every netizen has become a creator and disseminator of Internet information, which promotes the explosive growth of Internet text information. At the same time, the form of text content on the Internet is also constantly changing, from blogs to light blogs and microblogs, from emails to forums and instant messaging, from news to comments, etc. A notable feature is that the length of these text messages is increasing. shorter. This is because the writing of short text messages is easy and casual, and the release is more convenient. At the same time, short text messages are more concise and compact than long text messages, which can save other users' time and energy in reading messages. Short text information has much wid...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 费高雷文永于富财胡光岷
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products