Method and apparatus for directionally grabbing page resource

A page resource and page technology, applied in the field of Internet resource collection, can solve the problems of missed pages and low recall rate, and achieve the effect of ensuring representativeness, improving efficiency, and ensuring accuracy

Inactive Publication Date: 2009-06-10
ZHEJIANG UNIV
View PDF0 Cites 80 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this kind of focused crawler that only uses the relevance of the parent page to the topic to predict the relevance of the subpage to the topic as a guide will inevitably miss many page

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for directionally grabbing page resource
  • Method and apparatus for directionally grabbing page resource
  • Method and apparatus for directionally grabbing page resource

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0093] In order to make the above objectives, features and advantages of the present invention more obvious and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0094] The present invention can be used in many general-purpose or special-purpose computing device environments or configurations. For example: personal computers, server computers, handheld devices or portable devices, tablet devices, multi-processor devices, distributed computing environments including any of the above devices or devices, and so on.

[0095] The invention can be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The present invention can also be practiced in distribute...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for directionally snatching a webpage resource. The method comprises the followings steps: snatching webpage according with an amount threshold value in advance according to seed site URL; determining characteristic webpage in the pre-snatched webpage; generating a regular expression generalizing the characteristic webpage URL; matching the seed site URL with the regular expression, and maintaining the seed site URL meeting the matching condition as a snatching target URL; and snatching webpage according to the snatching target URL. The method can effectively improve the utilization percent and call back ratio of webpage resource snatching, thereby better helping people acquire required information with large scale, high efficiency and high accuracy on the internet.

Description

technical field [0001] The invention relates to the field of Internet resource collection, in particular to a method for directional grabbing page resources and a device for directional grabbing page resources. Background technique [0002] With the rapid development of the network, the World Wide Web has become the carrier of a large amount of information. In order to effectively extract and utilize this information, the search engine (Search Engine), as a tool to assist people in retrieving information, has become the entrance and guide for users to access the World Wide Web. The function of search engines to automatically extract web pages from the World Wide Web is realized through web crawlers. [0003] The current web crawlers can be divided into general crawlers and focused crawlers. The universal crawler is based on the idea of ​​breadth-first search. It starts from the URL (Uniform Resource Locator, Uniform Resource Locator) of one or several initial web pages, and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 郑小林陈德人周涛叶勤勇
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products