Method and device for judging type of webpage

A type of webpage, webpage technology, applied in the field of judging webpage type, can solve problems such as fragmentation, misjudgment, and failure to combine

Inactive Publication Date: 2010-10-27
FUJITSU LTD
View PDF1 Cites 59 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Although the above methods have achieved good results in blog identification, they separate the statistical identification method based on machine learning from the rule identification method based on the specific characteristics of the blog platform or webpage provided by the blog provider. Combining the two methods
Compared with the statistical identification method, the rule identification method generally has the advantages of fast speed and high precision. However, due to the continuous increase in the number of websites, the rule i...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for judging type of webpage
  • Method and device for judging type of webpage
  • Method and device for judging type of webpage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024]Exemplary embodiments of the present invention will be described below with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual implementation are described in this specification. However, it should be understood that many implementation-specific decisions must be made in developing any such practical implementation in order to achieve the developer's specific goals, and that these decisions may vary from implementation to implementation . Moreover, it should also be understood that development work, while potentially complex and time-consuming, would at least be a routine undertaking for those skilled in the art having the benefit of this disclosure.

[0025] Here, it should also be noted that, in order to avoid obscuring the present invention due to unnecessary details, only the steps and / or device structures closely related to the solution according to the present invention are shown in the drawings, and t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for judging the type of a webpage. The method comprises the following steps: carrying out rule matching in a prestored rule table on the basis of the URL of a webpage to be judged, wherein the rule list comprises a plurality of rule records for judging the type of the webpage; if the rule matching is successful, obtaining the type of the webpage to be judged according to the successfully matched rules; if the rule matching is unsuccessful, extracting predetermined features from the URL and/or HTML source codes of the webpage to be judged, and utilizing a classifier to classify the type of the webpage to be judged so as to obtain the type of the webpage to be judged on the basis of a feature vector composed of features selected from the extracted predetermined features. In the scheme of the invention, the invention can combines the advantages of a rule recognition scheme and a recognition scheme based on statistical learning, and can realize to judge the types of various webpage such as blogs, forums, news and the like.

Description

technical field [0001] The present invention generally relates to a natural language processing technology including text classification, and in particular relates to a method and / or device for judging a webpage type. Background technique [0002] With the rapid development of computer and network technology, various factors such as the demand for personal space and the simplification of website creation have promoted the rapid increase in the number of websites. Taking China as an example, according to the "22nd Statistical Report on Internet Development in China" released by China Internet Network Information Center (CNNIC), by the end of June 2008, the total number of domain names in China reached 14.85 million, with an annual growth rate of 61.8%. . In recent years, the number of users of various network media such as network news, blog (Blog) / personal space, forum (BBS) has increased greatly. Among all network applications including basic applications, online media, d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
Inventor 何楠王主龙于浩
Owner FUJITSU LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products