Internet website automatic classification method based on deep learning

A deep learning and automatic classification technology, applied in the field of artificial intelligence, can solve problems such as inapplicability of methods and models, insufficient ability to extract features and learn information, and insufficient information for classification basis

Inactive Publication Date: 2018-08-03
INST OF INFORMATION ENG CAS
View PDF8 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0015] In short, the shortcomings and deficiencies of the existing technology are: the amount of data used for model training is not large enough, not new enough, the amount of information based on classification is insufficient, and the meth

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Internet website automatic classification method based on deep learning
  • Internet website automatic classification method based on deep learning
  • Internet website automatic classification method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] combine figure 1 , the present invention is based on the Internet website classification method of LSTM cycle neural network deep learning model, comprises the following steps:

[0060] Step 1. Combine figure 2 According to the DNS server logs of the live network, the original description information of a large number of Internet websites is collected as a website data set and preprocessed and manually labeled, and then the high-dimensional feature vector representation of each website is extracted for input into the deep learning model, and each The website adds the corresponding website category label and converts it into a category vector. The specific steps are:

[0061] Step 1-1, combine image 3 , to preprocess the original description information of the Internet website. Preprocessing includes removing invalid entries (such as removing data entries corresponding to 404 page not found errors, internal server errors, and website service outages, etc.), original ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to an Internet website automatic classification method based on deep learning. The method comprises the steps of: collecting a lot of original description information ofan internet website according to a DNS server log as a website data set, and performing pre-processing and manual tagging; extracting high-dimensional feature vector representation, configured to input a deep learning model, of each website, increasing corresponding website category tags for each website, and converting the corresponding website category tags to type vectors; taking high-dimensional feature vectors as input of the deep learning model, taking the category vectors as output of the deep learning model, and employing an Adam gradient descent algorithm optimizer to supervise and train a recurrent neural network deep learning model based on the LSTM; increasing one layer of SoftMax regression behind the trained LSTM recurrent neural network deep learning model; and taking the website category corresponding to the dimension of the maximum probability value in the probability distribution vector as a website category, and comparing the output network category with the actual category of the website to obtain a classification accuracy of an Internet website.

Description

technical field [0001] The invention relates to a method for automatically classifying Internet websites based on deep learning, which belongs to the technical field of artificial intelligence. Background technique [0002] In recent years, the Internet has grown explosively, and China's development speed in the Internet field has amazed the world. The 40th "Statistical Report on Internet Development in China" issued by China Internet Network Information Center in July 2017 [1] It pointed out that as of June 2017, the number of Chinese netizens reached 751 million, and a total of 19.92 million new netizens were added in half a year. The Internet penetration rate has reached 54.3%, while the total number of Chinese websites has reached 5.06 million, and the number of websites under the ".CN" domain name has reached 2.7 million. my country's Internet has shown an upward trend in terms of the number of Internet users, Internet penetration rate, and the number of websites. [...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F17/30
CPCG06F16/958G06F18/2411
Inventor 杜沐阳韩言妮杨兴华谭红艳
Owner INST OF INFORMATION ENG CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products