Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Webpage classification method and device

A webpage classification and equipment technology, applied in the Internet field, can solve problems such as long time, and achieve the effect of improving efficiency and success rate

Active Publication Date: 2013-03-06
CHINA MOBILE SUZHOU SOFTWARE TECH CO LTD +2
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At this time, the method that can be used is real-time crawling and prediction. It may take about tens of minutes to predict the category of a web page. If batch prediction can be parallelized, the time is still very long, at least at the hour level

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage classification method and device
  • Webpage classification method and device
  • Webpage classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Aiming at the defects in the prior art, the embodiment of the present invention proposes a technical solution for classifying webpages. In the technical solution proposed by the embodiment of the present invention, the URL is divided into levels by intercepting the URL, and the upper-level URLs of adjacent levels of URLs are obtained by intercepting the lower-level URLs, and the upper-level URLs are added to the existing URL category library. The record of URL (that is, URL, the prediction category of this URL and the upper URL of the adjacent hierarchy of this URL are recorded in the URL category storehouse in the embodiment of the present invention), and the prediction category of the record upper URL, when needing to classify the webpage, The URL category library can be queried according to the URL of the webpage to be classified; if no matching URL is queried, the URL category library is queried according to the upper URL of the URL, and when a matching URL is querie...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses webpage classification method and device. The method includes establishing a virtual hierarchical URL (uniform resource locator) according to recording in an existing URL class library, and predicting the class of the hierarchical URL; when classification on webpages to be classified is needed, searching the URL class library according to URLs of the webpages to be classified; if matching URLs are unfound, searching the URL class library according to higher-level URLs of the URLs; and when matching URLs are found, determining the classes of the webpages to be classified according to predicted classes of the found URLs. Efficiency and success rate in webpage classification by the method and device are improved.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a web page classification method and equipment. Background technique [0002] With the rapid development of mobile Internet technology, the number of mobile Internet users is increasing. Therefore, the behavior analysis of mobile Internet users has gradually become a research hotspot. [0003] In the prior art, user behavior is usually analyzed according to access logs of mobile Internet users. Specifically, the access log of the mobile Internet user is stored in the WAP (Wireless Application Protocol, wireless application communication protocol) gateway, and the URL (Universal Resource Locator, Uniform Resource Locator) of the webpage that the user visits is recorded in the access log, through By querying the URL category library, the category of the webpage visited by the user can be obtained, and then the behavior preference of the corresponding user can be obtained. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 徐萌何洪凌胡珉罗治国孙少陵陶涛陈婷张新访李成华
Owner CHINA MOBILE SUZHOU SOFTWARE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products