Offline model improvement and selection method for junk short message classification

A spam SMS, offline model technology, applied in text database clustering/classification, special data processing applications, instruments, etc., can solve the problems of wrong prediction results, large classification errors, easy to be affected by noise data, etc., to avoid loss , avoid information loss, improve accuracy and effectiveness

Active Publication Date: 2017-10-17
HOHAI UNIV
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

No training is required, but the classification error is also large. If the k value is selected too small, it is easily affected by the noise data. If the k value is se

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Offline model improvement and selection method for junk short message classification
  • Offline model improvement and selection method for junk short message classification
  • Offline model improvement and selection method for junk short message classification

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0043] In the following, the present invention will be further clarified with reference to specific examples. It should be understood that these examples are only used to illustrate the present invention and not to limit the scope of the present invention. After reading the present invention, those skilled in the art will understand various equivalent forms of the present invention. All the modifications fall within the scope defined by the appended claims of this application.

[0044] The offline model improvement and selection method for spam classification includes the following four aspects:

[0045] (1) Short message text preprocessing. The main preprocessing content includes: word segmentation, uniform conversion of short message text to short description, conversion of desensitized strings such as numbers to single characters, and removal of stop words;

[0046] (1.1) Use Ansj to segment the text of the SMS and retain the part-of-speech tag;

[0047] (1.2) Unified conversion of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses an offline model improvement and selection method for junk short message classification. The method comprises the following steps: (1) carrying out feature selection and expansion, selecting features by using a feature selection method, constructing a feature word vector, and using a feature word vector model to express an original short message text; (2) carrying out optimization training and testing on the offline classification algorithm and improvement, carrying out improvement for junk short message classification on the offline classification algorithms, according to each offline classification algorithm and improvement, carrying out data preparation on the training set and the testing set obtained in the step (1), and using the training set to carry out optimization training and testing on each offline classification algorithm and improvement; and (3) based on the offline classification algorithm selection of the evaluation criteria, putting forward an evaluation index for the junk short message classification, analyzing the test result obtained in the step (2) by using the evaluation index, and selecting an optimal offline classification algorithm.

Description

technical field [0001] The invention relates to an offline text classification algorithm, in particular to an offline model improvement and selection method for classification of junk short messages, and belongs to the technical field of text content-based identification of junk short messages. Background technique [0002] The most important thing in the text classification problem is to select and train the text classification model, and the performance of text classification depends on the text classification model to a large extent. Recently, researchers have proposed various text classification models based on machine learning, combined with multidisciplinary theories such as statistics and informatics. [0003] Naive Bayesian classification algorithm is a machine learning method based on statistics, which is widely used in text classification problems. The algorithm is based on the assumption of feature independence. Although there is often correlation between feature...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/3344G06F16/35
Inventor 毛莺池齐海贾必聪李晓芳平萍徐淑芳
Owner HOHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products