Rubbish text recognition method and system

A spam and text technology, applied in the field of Internet information processing and pattern recognition, can solve the problem of high probability of non-spam text being recognized as spam text, and achieve the effect of improving accuracy

Active Publication Date: 2009-07-08
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF0 Cites 73 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] Depend on figure 1 It can be seen that in the existing method for identifying junk text, as long as the text to be processed contains sensitive words, it will be identified as junk text, but in fact, the sensitive words contained in the text to be processed are different, or the number of sensitive words contained At different times, the probability of belonging to junk text is also different. figure 1 The method shown has a higher probability of identifying non-spam text as spam text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rubbish text recognition method and system
  • Rubbish text recognition method and system
  • Rubbish text recognition method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and examples.

[0028] The method for identifying junk text in the present invention is mainly divided into two steps, for details, please refer to figure 2 .

[0029] figure 2 It is a flow chart of the method for identifying junk text in the present invention, such as figure 2 As shown, the method includes:

[0030] Step 201, establishing a garbage signature database.

[0031] In this step, the feature of the garbage sample is extracted, and the garbage feature is determined from all the features of the garbage sample according to the probability that the text containing the feature belongs to the garbage text, and each garbage feature is assigned a garbage weight. Garbage features make up a garbage feature library.

[0032] The garbage samples are generally m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for recognizing spam texts, which comprises the following steps: extracting features of spam samples, confirming spam features from all the features of the spam samples according to the probability of the spam texts attribute to the texts including the features, endowing a spam weight for each spam feature and forming a spam feature database by all the spam features endowed with spam weights; matching pending texts with the spam features in the spam feature database, and judging whether the pending texts are spam texts according to the spam weights matched with all the spam features. The system comprises the spam feature database and a spam text recognizing device, wherein, the spam feature database is used for storing the spam features endowed with spam weights; and the spam text recognizing device is used for receiving the pending texts, matching the pending texts with the spam features in the spam feature database and judging whether the pending texts are spam texts according to the spam weights matched with all the spam features. Moreover, the invention can enhance the accuracy of recognizing spam texts.

Description

technical field [0001] The invention relates to the technical field of Internet information processing and pattern recognition, in particular to a method and system for recognizing junk text. Background technique [0002] In the Internet field, in order to provide Internet users with the information they need, information filtering is required. Information filtering means that the computer identifies the information that meets the user's needs from the dynamically changing information flow according to the template information that reflects the user's needs, and eliminates the information that is irrelevant to the user's needs or harmful to the user's needs. [0003] A typical application of information filtering is to filter junk texts from news texts in the Internet, question or answer texts in Q&A interactive platforms, so that the news texts, question texts and answer texts provided to users can meet the needs of users text. [0004] In the process of filtering junk te...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCH04L12/585G06F17/30707G06F16/353H04L51/212
Inventor 刘怀军方高林
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products