Webpage classification method and webpage classification system

A webpage classification and webpage technology, applied in the field of Internet applications, can solve problems such as low accuracy, large webpage traffic, and low efficiency, and achieve the effect of improving accuracy and efficiency and improving accuracy

Inactive Publication Date: 2015-12-02
BEIJING DEEPZERO TECH CO LTD
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Among the above classification methods, method 3) is realized by algorithms such as machine learning, and its accuracy is relatively low; method 2) has high accuracy but low efficiency; method 1) although the efficiency and quality are good, but fo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage classification method and webpage classification system
  • Webpage classification method and webpage classification system
  • Webpage classification method and webpage classification system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0057] figure 1 A schematic block diagram of a web page classification system according to one aspect of the present invention is shown. Wherein, the system includes an acquisition device 11 for web pages to be classified, a breadcrumb crawling device 12 and a web page classifier 13 . Preferably, the system further includes user attribute classification means 14 . Specifically, the webpage acquisition device 11 to be classified receives the domain name input by the user, and obtains the address (url) corresponding to the webpage that needs to crawl breadcrumbs based on the domain name; the breadcrumb crawling device 12 crawls the webpage based on the address breadcrumbs; the webpage classifier 13 classifies the webpages based on the crawled breadcrumbs. Further, the user attribute classification device 14 classifies the attributes of the users who visit th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention aims at providing a webpage classification method and a webpage classification system, wherein the method comprises the following steps that: a webpage-to-be-classified obtaining device receives a domain name input by a user, and obtains an URL (Uniform Resource Locator) corresponding to a webpage of breadcrumb to be crawled on the basis of the domain name; a breadcrumb crawling device crawls the breadcrumb of the webpage on the basis of the URL; and a webpage classifier classifies the webpage on the basis of the crawled breadcrumb. Compared with the prior art, the method and the system have the advantages that the breadcrumb is extracted from the webpage on the basis of the domain name; the webpage is classified; and the webpage classification accuracy is effectively improved.

Description

technical field [0001] The invention relates to the technical field of Internet applications, in particular to a web page classification method and system. Background technique [0002] With the expansion of Internet information demand, targeted delivery of information has become a trend. In order to deliver information in a more targeted manner, it is necessary to conduct attribute analysis or label definition of the crowd on the Internet, which is mainly analyzed by judging the type of media web pages visited by users. Among them, common methods for classifying media webpages mainly include: [0003] 1) Use the character string of url (Uniform Resource Locator) to classify, for example, use the character string "sports" in sports.qq.com to classify the webpage into the sports category; [0004] 2) Manual identification, where experienced personnel classify webpages according to webpage content; [0005] 3) Identifying the frequency of keywords in webpage content, mainly...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/951
Inventor 林招洪婷婷杨晓磊陈岩
Owner BEIJING DEEPZERO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products