Method and device for spam filtering based on short text
A technology of spam filtering and short text, applied in the field of spam filtering devices based on short text, can solve the problems of interfering text classification, wrong results, not reading in time, etc., to achieve the effect of strengthening word segmentation results and reducing possibility
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
no. 1 example
[0022] figure 1 It is a flow chart of the first embodiment of a short text-based spam filtering method of the present invention, including:
[0023] S100. Perform word segmentation processing on the text in the email and obtain a word segmentation result.
[0024] When word segmentation is performed on the text in the email, it is necessary to separate the HTML tags, Chinese characters and English characters, and then perform word segmentation on the Chinese characters and English characters respectively to obtain word segmentation results.
[0025] S101. Use TF-IDF technology to sort the word segmentation results to obtain a word segmentation list.
[0026] After extracting the word segmentation results (Chinese word segmentation and English word segmentation) from the email, use the TF-IDF algorithm to sort the word segmentation results from high to low according to the discrimination ability, and obtain the word segmentation list after sorting.
[0027] It should be noted...
no. 2 example
[0036] figure 2 It is a flow chart of the second embodiment of a short text-based spam filtering method of the present invention, including:
[0037] S200. Preprocess the text and extract the Chinese text and / or the English text.
[0038]When working, the email is first fetched, and the text in the email is preprocessed. For Hypertext Markup Language (HTML) documents, the HTML tags (HTML tags) are extracted and processed separately; for the remaining information, Chinese characters and English characters are separated, and converted into only English characters. Text with only Chinese characters.
[0039] S201. Perform word segmentation processing on the Chinese text and the English text respectively, and obtain word segmentation results.
[0040] For English text, use the traditional word segmentation method to obtain word segmentation results (separate each word segmentation with punctuation marks and spaces).
[0041] For Chinese text, the words are separated from the ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com