Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Webpage classification method and device based on URL analysis

A webpage classification and webpage technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of slow webpage classification speed, and achieve the effect of fast and effective classification

Inactive Publication Date: 2016-09-21
GUANGDONG KINGPOINT DATA SCI & TECH CO LTD
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing webpage classification methods are slow in classifying webpages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage classification method and device based on URL analysis
  • Webpage classification method and device based on URL analysis
  • Webpage classification method and device based on URL analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] Such as figure 1 Shown is a flow chart of a method for classifying webpages based on URL analysis provided by the present invention, the method comprising the following steps:

[0042] In step S1, the complete URL is divided into blocks, and feature words are screened out from the URL blocks according to the URL dictionary, and URLs are roughly classified according to the URL dictionary and feature words to obtain roughly classified webpages and their corresponding categories.

[0043] Step S2, after preprocessing the webpage text in the webpage that cannot be roughly classified and converting it into a vector model, classify it through the generated classifier to obtain the webpage that cannot be roughly classified and its corresponding category.

[0044] Step S3, storing the complete URL, web pages that can be roughly classified and their corresponding categories, and web pages that cannot be classified and their corresponding categories.

[0045] Such as figure 2 ...

Embodiment 2

[0058] Such as Figure 4 As shown, it is a functional block diagram of a webpage classification device based on URL analysis provided by the present invention, and the device includes:

[0059] The web page rough classification module 10 is used to divide the complete URL into blocks, and filter out characteristic words from the URL blocks according to the URL dictionary, and roughly classify the URLs according to the URL dictionary and the characteristic words, so as to obtain the web pages and their web pages that can be roughly classified. corresponding category.

[0060] The webpage text classification module 20 is used to preprocess the webpage text in the webpage that cannot be roughly classified and convert it into a vector model, and then classify it through the generated classifier to obtain the webpage that cannot be roughly classified and its corresponding category.

[0061] The storage module 30 is configured to store complete URLs, webpages that can be roughly cl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a webpage classification method and device based on URL analysis; the device comprises a webpage rough classification module, a webpage text classification module and a storage module; compared with the prior art, the webpage classification method and device based on URL analysis can add URL analysis before webpage text classification; the method and device can carry out rough classification for the webpage according to the URL analysis result, and the webpage text classification is carried out for webpage that cannot be roughly classified, thus fast and effectively classifying all webpage, and helping users to select demanding webpage.

Description

technical field [0001] The invention relates to the technical field of webpage classification, in particular to a method and device for classifying webpages based on URL analysis. Background technique [0002] With the advent of Internet 2.0, the number of web pages has increased dramatically. Facing the increasing information on the Internet, how to quickly and accurately find the desired content from the vast information resources has become a major problem. As a key technology with great practical value, text classification can effectively solve the above problems, but because web pages not only contain text, it determines that web page classification methods are more abundant than text classification methods. The existing webpage classification methods are slow in classifying webpages. [0003] In view of the above-mentioned defects, the creator of the present invention has finally obtained the present invention through long-term research and testing. Contents of the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/353G06F16/9566G06F18/2411
Inventor 潘宇翔李青海简宋全侯大勇
Owner GUANGDONG KINGPOINT DATA SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products