Microblog information filtering method based on multi-information fusion

A technology of information filtering and microblogging, which is applied in special data processing applications, instruments, website content management, etc., can solve the problems that cannot meet the requirements of microblogging information filtering, and achieve the effect of high accuracy

Active Publication Date: 2014-12-24
中科嘉速(北京)信息技术有限公司
View PDF4 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Considering the dissemination, user correlation, and time correlation of microblog, the existing information filtering methods for text content itself are not enough to meet the requirements of microblog information filtering.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microblog information filtering method based on multi-information fusion
  • Microblog information filtering method based on multi-information fusion
  • Microblog information filtering method based on multi-information fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0033] The invention relates to a microblog information filtering method based on a plurality of information fusions, in particular to a method for automatically calculating spam information on a microblog.

[0034] According to the scope of the filtered information, the evaluation data set is manually defined. The definition of spam is different under different task frameworks. Similarly, the present invention is not aimed at a certain task, such as classification or clustering, or upper-layer application of sentiment analysis, but is general, and judges whether the information is useless logically.

[0035] Definition of spam:

[0036] 1) Strong repeatability: there are many similar microblogs;

[0037] 2) The number of forwarding replies is small: it is of no value to others;

[0038] 3) Reposting not out of interest: such as re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a microblog information filtering method based on multi-information fusion, which belongs to the technical field of intelligent information processing. The method comprises the following steps of step 1, building distributed crawling, and crawling microblog data; step 2, preprocessing the microblog data; step 3, carrying out Chinese word segmentation on the microblog data, deleting stop words, acquiring a word segmentation result, and obtaining a word set VOC; step 4, extracting characteristics from the perspective of microblog contents; step 5, extracting microblog characteristics from the perspective of the client; step 6, extracting characteristics from a transmission path; step 7, building a classification model, and screening non-junk microblogs. According to the microblog information filtering method based on multi-information fusion, the double processes of microblog information duplicate removal and a classification learning algorithm are combined to delete microblog junk information, so that the microblog information can be filtered, and not only is reduplicative microblog information filtered, but also junk microblog information can be filtered.

Description

technical field [0001] The invention belongs to the technical field of intelligent information processing, and in particular relates to a microblog information filtering method based on fusion of various information. Background technique [0002] Microblog, as a new communication carrier, contains a large number of users' microblog information on people, events, etc., so it plays an important role in the initiation and dissemination of Internet public opinion, and has become one of the important data sources for Internet public opinion browsing and analysis. one. However, in the microblog space, the convenient "forwarding" operation and the rapidly growing "network army" make a large amount of the same or similar data spread rapidly in the microblog space. At the same time, noise microblogging, as a means of publicity, has rapidly spread to every corner of the microblogging space. For the analysis of network public opinion, noisy microblogs are usually meaningless, and the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9535G06F16/958
Inventor 闫碧莹余雷袁伟邓攀赵鑫
Owner 中科嘉速(北京)信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products