Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and system for mail classification based on clustering

A classification method and e-mail technology, applied in the Internet field, can solve problems such as the inability to meet users' needs for e-mail classification

Active Publication Date: 2017-09-12
新浪技术(中国)有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the inventors of the present invention have found that the mail classification method in the prior art can no longer meet the user's classification requirements for mail: in order to facilitate the query of received mail, the user usually hopes that the mail system has the function of classifying multiple types of mail. Received emails can be divided into business news, social, training, recruitment, and investment and financial emails; therefore, it is necessary to provide a method for classifying emails in multiple categories

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for mail classification based on clustering
  • Method and system for mail classification based on clustering
  • Method and system for mail classification based on clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0089] Embodiment 1 of the present invention provides a method for mail classification based on clustering, the specific process is as follows figure 1 As shown, specific steps may include:

[0090] S101: For each email in the mail set to be classified, obtain the word set of the mail, and determine the word set of the mail set to be classified according to the obtained word sets of each mail.

[0091] Specifically, for each mail in the mail collection to be classified, apply a statistical model (such as a hidden Markov model) to segment the mail content of the mail to obtain the word segmentation result of the mail; Use words and uncommon words to get the word set of the email. After merging the word sets of each email in the mail set to be classified into the same word set, redundant words due to repetition in the same word set are removed to obtain the word set of the mail set to be classified. The collection of emails to be classified contains emails that meet the set co...

Embodiment 2

[0144] In order to improve the efficiency of mail classification, in the technical solution of the second embodiment of the present invention, the word feature vector set is first divided into subsets of a set number, and the vector clusters of each subset are clustered and merged in parallel to improve the clustering efficiency. After that, the vector clusters of each subset are clustered and merged, thereby improving the efficiency of clustering and merging the vector clusters of the word feature vector set as a whole, and improving the efficiency of mail classification.

[0145] Embodiment 2 of the present invention provides a method for mail classification based on clustering, the specific process is as follows image 3 shown, including the following steps:

[0146] S301: For each email in the mail set to be classified, obtain the word set of the mail, and determine the word set of the mail set to be classified according to the obtained word sets of each mail.

[0147] S3...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a mail classification method and a mail classification system based on clustering. The mail classification method comprises the steps of carrying out word segmentation on each mail in a mail set to be classified to obtain a word set of each mail; determining the word feature vector of each mail; creating a vector cluster containing the word feature vector for each word feature vector after the word feature vectors of all mails form a word feature vector set, and making each word feature vector represent a cluster center of the corresponding vector cluster; carrying out at least one cluster merging on the vector clusters according to the similarity among the word feature vectors, and classifying mails corresponding to the word feature vector contained by the vector cluster into one kind of mails for each vector cluster after cluster merging. According to the technical scheme, the mails are classified according to the word feature vectors which cluster into the vector clusters, kinds of sample mails are not needed in advance, and various kinds of mails can be classified according to the content of the mails.

Description

technical field [0001] The invention relates to the Internet field, in particular to a method and system for classifying mails based on clustering. Background technique [0002] With the improvement of social informatization, more and more users use emails, and users often receive a large number of emails, which are usually various types of emails, such as business news, orders, social networking, training, recruitment and Investment and financial management and other types of emails. [0003] The method for classifying mail in the current mail system focuses on classifying mail as spam or non-spam, usually using a classification method based on mail content. Specifically, in the training set composed of multi-sample emails, the contents of the sample emails classified as spam or non-spam are processed, and various machine learning algorithms are used according to the contents of the processed sample emails, such as Bayesian (Bayes) algorithm, Support Vector Machine (Suppo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/80
Inventor 陈玉焓
Owner 新浪技术(中国)有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More