Web page classification system and method

A web page classification and web page technology, applied in the Internet field, can solve problems such as inability to share resources, and achieve the effect of solving resource sharing problems, comprehensive coverage, and improved coverage.

Active Publication Date: 2013-01-30
BEIJING QIHOO TECH CO LTD +1
View PDF11 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The systems are independent of each other. The pages that have been downloaded, analyzed and processed are searched across the entire network, and the vertical search will also be independently downloaded, analyzed and processed, and resources cannot be shared.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page classification system and method
  • Web page classification system and method
  • Web page classification system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0046] The webpage classification method flow process of the present embodiment is as follows figure 1 shown, including:

[0047]Step S 110, extracting the page frame of the pre-acquired webpage, and calculating the page frame ID. The pre-acquired webpage may be a webpage crawled by the whole network search. The method of extracting the page frame of the web page is as follows: extract the page frame of the web page according to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a web page classification system relating to the technical field of search engines. The system comprises a page frame ID (identity) computation module, a mode accumulation module and a web page class identification module, wherein the page frame ID computation module is suitable for extracting the page frame of a web page obtained in advance and computing the page frame ID; the mode accumulation module is suitable for computing the page frame mode when the quantity of the accumulated page frames with the same ID reaches a threshold; and the web page class identification module is suitable for comparing the page frame mode with a page frame mode of a known class in a product knowledge base built in advance to identify the class of the web page. The invention also discloses a web page classification method. According to the system and the method, whole network search and vertical search can be combined; therefore, the problems that extraction by a current universal algorithm is rough and extraction by a direction method is fine but huge in manual workload and poor in adaptability are solved, more accurate data contents can be extracted, and the problem on resource sharing of whole network search and vertical search is simultaneously solved.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a web page classification system and method. Background technique [0002] In search technology, there are basically two categories. One is to crawl all the webpages on the whole Internet (currently, the crawling depth is limited in a site, and js (java script) is generally not processed, and only some dynamic pages are processed), and the webpages are processed and analyzed The web search, that is, the whole network search. The other type is vertical search that only crawls and analyzes certain types of pages, such as image search, video search, blog search, forum search, news search, etc. For most search verticals, this is currently done on a seed basis (also known as a listing page). The processing of vertical search can be divided into two parts: one is to find seeds; the other is to find specific product pages from the seed pages, that is, pages of different categori...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 卢宏林
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products