Rubbish blog detecting method

A detection method and blog technology, applied in the field of blogs, can solve the problems of insufficient feature selection of spam blogs, low accuracy rate of distinguishing spam blogs from normal blogs, etc., and achieve high accuracy

Inactive Publication Date: 2009-03-25
ZHEJIANG UNIV
View PDF0 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, the shortcomings of the existing processing methods are that the feature selection of spam blogs is not enough, and the accuracy rate of distinguishing spam blogs from normal blogs is not high.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rubbish blog detecting method
  • Rubbish blog detecting method
  • Rubbish blog detecting method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] There are three key points in the implementation of the present invention: blog text content feature extraction, blog page link feature extraction and blog text time distribution feature extraction. After obtaining the blog page data, the present invention obtains the feature vector through text content analysis, blog page link analysis and blog text time attribute analysis, and uses an automatic text classification algorithm to realize accurate classification of spam blogs.

[0037] 1. Feature extraction of blog text content:

[0038] As far as a single article is concerned, a blog article (including the article title) is used as the object, and the feature item is represented by the binary method. Binary representation, that is, one of {0, 1} is selected, and the keyword that appears is represented by 1, and the keyword that does not appear is represented by 0. In the standardized word frequency representation method, it is necessary to make appropriate improvements ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for detecting a garbage blog. The method is designed as follows: through analyzing a cheating technology of network garbage, the method aims at an essential attribute of the garbage blog, adopts a technology of text classification in binary classification and surrounds three angles of the content characteristic of a blog text, the link character of a blog page and the time distribution character of the blog text, The method is built on the basis of the comprehensive analysis of the content of a blog webpage and carries out optimization on the operation of character extraction of the blog, thereby ensuring the higher accuracy rate of classifying the garbage blog.

Description

technical field [0001] The invention relates to blogs and text classification technology, in particular to a method for detecting spam blogs. Background technique [0002] In recent years, the blog, a new thing, has developed vigorously, resulting in a large amount of blog information. However, as a by-product of blogs, spam blogs also emerge as the times require. Their existence greatly wastes network bandwidth and storage resources, increases the difficulty for people to obtain high-quality information, and also reduces network users' satisfaction with blog search experience. [0003] A normal blog has two characteristics: one is composed of short and frequently updated articles; the other is that the posted articles are arranged in reverse chronological order. In addition to the above characteristics, spam blogs also have the characteristics of link factories and advertising blogs. The link factory feature means that the spam blog pages pile up a large number of popular...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 陈纯卜佳俊张峰仇光郑淼
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products