Method and device for web page classification

A web page classification and web page type technology, applied in the field of Internet communication, can solve the problem of low page classification efficiency, achieve the effect of fast and efficient classification, reduce impact, and improve user experience

Inactive Publication Date: 2016-04-20
ZTE CORP
View PDF5 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The main technical problem to be solved by the present invention is to provide a method and device for classifying webpages, which can solve the problem of low efficiency of classifying webpages by using the current webpage classification methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for web page classification
  • Method and device for web page classification
  • Method and device for web page classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0067] Considering the problem that current webpage classification methods cannot efficiently and quickly classify webpages, this embodiment provides a webpage classification method that is different from the prior art, and utilizes the structural similarity of webpage addresses (URLs) to implement webpage classification. Quick classification; such as figure 1 As shown, the web page classification method of the present embodiment specifically includes the following steps:

[0068] Step 101: Establish a feature word classifier according to a set of webpage samples. The set of webpage address samples includes: a plurality of sample webpage addresses and a webpage type corresponding to each of the sample webpage addresses.

[0069] Before webpages are classified, the method of this embodiment selects some sample webpage addresses and the webpage types corresponding to the sample webpage addresses in advance; such as webpage address 1-financial affairs, webpage address 2-sports, w...

Embodiment 2

[0179] This embodiment provides a web page classification device, such as Figure 5 As shown, it includes: a feature word classifier building module, an acquisition and recognition module, a webpage address processing module, a storage module and a webpage classification module;

[0180] The characteristic word classifier establishing module is used to establish a characteristic word classifier according to a webpage sample set, and the webpage address sample collection includes: a plurality of sample webpage addresses and a webpage type corresponding to each sample webpage address.

[0181] The obtaining and identifying module is used to obtain a predetermined number of webpage addresses, and determine the type of webpage to which each of the webpage addresses belongs through the feature word classifier;

[0182] The webpage address processing module is used to perform deredundancy processing on the webpage address of the webpage type determined by the acquisition and identific...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for web page classification. The method for the web page classification disclosed by the invention comprises the steps that a characteristic word classifier is established according to a web page sample set, wherein the web page address sample set comprises a plurality of sample web page addresses and web page types corresponding to the sample web page addresses; the web page addresses of a preset quantity are acquired, and the web page type of each web page address is determined by the characteristic word classifier; the web page addresses of which the web page types are determined are treated by redundancy elimination, and structure character strings are then obtained, wherein the structure character strings are web page address structures; the web page address structures and the corresponding web page types are stored; and the web page address of a to-be-classified page is acquired during the web page classification, the corresponding web page address structure is obtained through implementation of the redundancy elimination to the web page address, and the web page type of the to-be-classified web page is searched from the storage according to the web page address structure. According to the method disclosed by the invention, the web page classification can be implemented rapidly and efficiently.

Description

technical field [0001] The invention relates to the technical field of Internet communication, in particular to a method and device for classifying webpages. Background technique [0002] Web page classification is a hot issue in current Internet applications. Classifying the webpages can be used to analyze the records of users' access to webpages, so as to obtain the user's online preference, so as to further provide the user with Internet services based on this preference. [0003] The results of webpage classification are generally crawled by the crawler system and stored in the data storage system. However, due to the huge number of web pages on the Internet, as the number of crawled web pages increases, data query and analysis will become slower and slower. [0004] At present, there are many webpage classification methods, all of which need to analyze the content of the webpage text to classify, and also need to record the corresponding relationship between the webpa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/00
Inventor 于波
Owner ZTE CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products