Unlock instant, AI-driven research and patent intelligence for your innovation.

A method and system for intelligent identification of webpage types based on deep learning

A deep learning and intelligent recognition technology, applied in character and pattern recognition, network data retrieval, network data query, etc., can solve the problems of obvious human factors, trouble, and low classification accuracy in the determination of features, so as to solve the classification defects. The effect of low rate, improved accuracy, and improved efficiency

Active Publication Date: 2019-06-21
GUANGDONG UCAP INTERNET INFORMATION TECH
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The traditional web page classification method needs to manually find the features, which is very troublesome, and the determination of the features has obvious human factors, and the classification accuracy is not high
The complexity of traditional machine learning lies in two points: 1. Cold start: There are no characteristic parameters before learning, which requires the combing and injection of expert knowledge, that is, the method of determining and calculating the characteristics

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for intelligent identification of webpage types based on deep learning
  • A method and system for intelligent identification of webpage types based on deep learning
  • A method and system for intelligent identification of webpage types based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] figure 1 It is a kind of web page type intelligent identification method based on deep learning of the present invention, comprising the following steps:

[0055] S1. Input the webpage to be classified and identified;

[0056] S2. The deep learning classification model classifies and recognizes the input webpage, and obtains category information of the webpage to be classified and recognized.

[0057] figure 2 For the specific training process of the deep learning classification model:

[0058] S2.1. Obtaining a web page data set marked with categories;

[0059]Targeted collection of web pages, and mark the web page category, through the crawler targeted collection of web pages, the provincial / city portals are classified into one category, the ministry websites are grouped into one category, and the vertical system websites are grouped into one category. Take several webpages, a total of 100,000 webpages as training webpages, and mark these training webpages with w...

Embodiment 2

[0078] A web page type intelligent identification system based on deep learning of the present invention, the system includes the following modules:

[0079] A web page type intelligent identification system based on deep learning, the system includes the following modules:

[0080] Input module: input the webpage to be classified and identified;

[0081] Type identification module: the deep learning classification model classifies and identifies the input webpage, and obtains the category information of the webpage to be classified and identified;

[0082] The deep learning classification model is further composed of the following modules:

[0083] Data acquisition module: acquire web page data sets marked with categories;

[0084] Screening module: filter training webpage set and test webpage set;

[0085] Preprocessing module: perform preprocessing operations on web pages;

[0086] Model calculation module: deep learning classification model calculatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a method and system for intelligent identification of webpage types based on deep learning, including: collecting different types of webpage data, marking each webpage category, and preprocessing each webpage to obtain training set data; data, constructing a deep learning model using a deep learning algorithm; preprocessing each webpage to be tested and inputting the obtained data into the deep learning model to obtain the webpage type of the test webpage. At the same time, the present invention also provides an intelligent identification system for webpage types based on deep learning. By adopting the embodiment of the present invention, the accuracy rate of intelligent classification of web pages can be improved.

Description

technical field [0001] The invention belongs to the technical field of Internet information collection, and in particular relates to a method and system for intelligently identifying web page types based on deep learning. Background technique [0002] Web page type identification is a key step in the data acquisition system. In the process of webpage data collection, in order to improve the efficiency of data collection, by identifying the type of webpage, for example, before parsing the content after a webpage is collected, it is necessary to judge the type of the webpage through the source code, whether it belongs to the column, page turning , topics or articles, for different types, call different content extraction algorithms, so as to improve the accuracy of content extraction, if there is no effective way to identify the type of web page, the update frequency calculation of web pages, article column marking, and automatic text extraction are all Great distraction. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/953G06K9/62
CPCG06F18/24G06F18/214
Inventor 汪敏刘鹏飞李伦凉李绪祥王静尹娜
Owner GUANGDONG UCAP INTERNET INFORMATION TECH