Microcontent similarity based antirubbish method

A similarity, anti-spam technology, applied in the anti-spam field of Internet micro-content, can solve the problems of increasing server burden, unable to guarantee service quality, unable to fully meet the needs of identifying spam comments, etc., to improve efficiency and reduce the number of comparisons. Effect

Inactive Publication Date: 2008-04-09
ZHEJIANG UNIV
View PDF0 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method is the most scientific and reasonable, but it requires the server to do a lot of processing, which increases the burden on the server. If you connect to a remote server, the quality of service may not be guaranteed due to the network
[0009] Therefore, none of the above methods can fully meet the needs of online real-time identification of spam comments.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microcontent similarity based antirubbish method
  • Microcontent similarity based antirubbish method
  • Microcontent similarity based antirubbish method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The present invention is defined as follows for the concept of comment similarity:

[0032] Word: an indivisible semantic unit;

[0033] High-frequency words: words like "的" and "ah" that have no semantic meaning and need to be filtered out;

[0034] Comments: a limited set of words, the original comments are segmented into words, and the result after filtering out high-frequency words;

[0035] The number of words in the comment: the potential of the set of comment words - the number of elements contained in the set;

[0036] "Intersection" of comments: the intersection operation of word sets;

[0037] "Union" of comments: union operation of word sets;

[0038] Define the similarity sim(a, b) between review a and review b:

[0039] The number of words of a and b / the number of words of a and b, that is

[0040] The number of words from a to b / (the number of words from a + the number of words from b - the number of words from a to b)

[0041] Combining the above co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an anti-spam method based on micro-content similarity. The method comprises clustering the comments that are discriminated to be the spam manually to generate a clustered spam file; and discriminating the unknown comments by using a spam discriminator according to the clustered spam file. The method for scoring the similarity of a random sample in all spam comment classes with a comment to be processed and scoring the class where the random sample with the highest similarity can obviate the similarity comparison between the spam comment to be processed and the clustered spam so as to effectively reduce the frequency for comparing the comment similarities, thereby improving the efficiency of spam discrimination and the clustered spam file maintenance to satisfy the performance requirement for massive spam discrimination on the internet.

Description

technical field [0001] The invention relates to an anti-garbage method for Internet micro-content, in particular to an anti-garbage method based on the similarity of micro-content. Background technique [0002] Blog is the fourth network communication method after Email, BBS, and ICQ. It is a personal "reader's digest" in the Internet age, a network diary with hyperlinks as a weapon, and a new way of life and work. , which represents a new way of learning. However, now that anti-spam technology is becoming more and more mature, blog comments are becoming more and more popular among businesses and ordinary netizens as a means of disseminating advertisements and publicity. This leads to more and more spam comments on the Blog, which greatly wastes network bandwidth, the time of Blog owners and readers, and system resources. [0003] Currently commonly used anti-spam techniques and methods include: [0004] 1) Set word group filtering to filter or shield some sensitive words...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L12/58G06F17/30
Inventor 胡天磊陈珂陈刚寿黎但汪源
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products