Method and apparatus for directionally grabbing page resource

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A page resource and page technology, applied in the field of Internet resource collection, can solve the problems of missed pages and low recall rate, and achieve the effect of ensuring representativeness, improving efficiency, and ensuring accuracy

Inactive Publication Date: 2009-06-10

ZHEJIANG UNIV

View PDF0 Cites 80 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, this kind of focused crawler that only uses the relevance of the parent page to the topic to predict the relevance of the subpage to the topic as a guide will inevitably miss many pages related to the topic. If the ratio of the topic-related pages taken to all the topic-related pages on the Internet is the recall rate, then the recall rate of this solution is relatively low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0093] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0094] The invention is applicable to numerous general purpose or special purpose computing device environments or configurations. For example: personal computer, server computer, handheld or portable device, tablet type device, multiprocessor device, distributed computing environment including any of the above devices or devices, etc.

[0095] The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for directionally snatching a webpage resource. The method comprises the followings steps: snatching webpage according with an amount threshold value in advance according to seed site URL; determining characteristic webpage in the pre-snatched webpage; generating a regular expression generalizing the characteristic webpage URL; matching the seed site URL with the regular expression, and maintaining the seed site URL meeting the matching condition as a snatching target URL; and snatching webpage according to the snatching target URL. The method can effectively improve the utilization percent and call back ratio of webpage resource snatching, thereby better helping people acquire required information with large scale, high efficiency and high accuracy on the internet.

Description

technical field [0001] The invention relates to the field of Internet resource collection, in particular to a method for directional grabbing page resources and a device for directional grabbing page resources. Background technique [0002] With the rapid development of the network, the World Wide Web has become the carrier of a large amount of information. In order to effectively extract and utilize this information, the search engine (Search Engine), as a tool to assist people in retrieving information, has become the entrance and guide for users to access the World Wide Web. The function of search engines to automatically extract web pages from the World Wide Web is realized through web crawlers. [0003] The current web crawlers can be divided into general crawlers and focused crawlers. The universal crawler is based on the idea of breadth-first search. It starts from the URL (Uniform Resource Locator, Uniform Resource Locator) of one or several initial web pages, and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

Inventor 郑小林陈德人周涛叶勤勇

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method and apparatus for directionally grabbing page resource

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology