A website classification method based on the comprehensive characteristics of darknet websites
A comprehensive feature and website technology, applied in the field of network data analysis, can solve the problems of increased cost of manual maintenance, difficulty in adapting to users' needs for classification of dark web websites, etc., achieve high classification accuracy and reduce costs
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0029] The present invention will be described in further detail below in conjunction with the accompanying drawings.
[0030] Processing method of the present invention is:
[0031] The first step is to crawl the marked website (such as figure 1 shown):
[0032] (1) Crawl marked websites with Scrapy, check the current crawling depth when crawling, and only crawl webpages with a depth less than or equal to 2.
[0033] (2) Manually review labels and remove incorrectly labeled samples.
[0034] Step 2: Obtain the comprehensive characteristics of the website (such as figure 2 shown):
[0035] (1) Use the word-bag model to construct the word space vector model of the website, and use the TFidfVectorizer class in the scikit-learn library of Python to calculate the TF-IDF value of the word.
[0036] (2) Extract Keyword (keyword in html meta tag), Description (webpage description information in html meta tag), Title (htm title) tag, its weight is 0.6, other word weight is 0.4, ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com