Method for webpage classification

A web page classification and web page technology, applied in the network field, can solve the problems of unknown URL classification, lack of coverage, time-consuming and labor-intensive, etc., and achieve the effect of rich content and high search efficiency

Inactive Publication Date: 2010-01-20
苏州普适通信息科技有限公司 +1
View PDF0 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, there is no way to classify unknown URLs. The method used by most people is to manually browse the web a

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for webpage classification
  • Method for webpage classification
  • Method for webpage classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0032] A method for classifying webpages in this embodiment includes a data collection layer, a webpage parsing layer, and an application presentation layer sequentially from bottom to top according to the data flow direction, such as figure 1 shown, including the following specific steps:

[0033] (1) Read the URL list of the preset URL navigation site, which stores many navigation URLs, such as www.hao123.com, www.sohu.com, etc.;

[0034] (2) Judging whether the URL list is empty, if empty, it means that the search has been completed, go to step 8 and end, if not empty, then continue to step 3;

[0035] (3) Take out a URL;

[0036] (4) Query the URL in the V_URL list of the visited URL storage table. V_URL stores all URL addresses that have been visited. If the URL is found in V_URL, it means that it has been visited. Then go to step 3, if not If it is found, it means that it has not been visited, then continue to step 5;

[0037] (5) Use the focused crawler technology to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for webpage classification. The invention comprises a data acquisition layer, a webpage analysis layer and an application representation layer from down to up in sequence according to the data flow direction, wherein the data acquisition layer acquires source codes of all webpages preset in a navigation website address list by using a focused crawling technology, the webpage analysis layer extracts structural information to the webpages meeting webpage specific structural characteristics, extracts the lower-layer linkage meeting the requirement and enables the information of the lower-layer linkage conforming to the search strategy to be added in a website address categories list and the application representation layer can obtain webpage classification information of an unknown URL according to the website address categories list. The invention not only has the search breadth of universal search, but also has the search depth of vertical search and also can conveniently obtain the webpage classification information of an unknown URL in the classification website.

Description

technical field [0001] The invention relates to the field of network technology, in particular to a method for classifying webpages. Background technique [0002] With the continuous expansion of network information, people are increasingly inseparable from search engines. Although general search engines such as Baidu and Google provide people with a lot of convenience, they also have certain limitations, such as: the results returned by general search engines include a large number of webpages that users do not care about, and the search depth is not enough. [0003] As a result, vertical search came into being. It is an accurate search technology serving a certain industry field. It is a subdivision and extension of search engines. Provide queries based on semantic information, so as to meet the special search needs of users. However, most of the current vertical searches are for a specific industry and a specific field, and it is impossible to conduct vertical searches ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 王攀张顺颐宫婷
Owner 苏州普适通信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products