Webpage mark extracting method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A webpage identification and storage structure technology, applied in the field of search engines, can solve the problems of occupation, more memory, and time-consuming processing, and achieve the effects of reducing excessive occupation, increasing processing speed, and reducing the number

Active Publication Date: 2007-07-04

TENCENT TECH (SHENZHEN) CO LTD

View PDF0 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0011] For each specified URL, a combination of multiple hash functions is used to generate a corresponding decision vector, and the corresponding decision vector is used to determine whether the specified URL is in the captured URL set, but the decision vector is used to judge It will cause a certain degree of misjudgment. If you want to reduce the probability of misjudgment, you need to select a variety of hash functions with good performance to combine, and it is required to generate a judgment vector with a large number of series, which will make it possible to judge whether the URL is in The processing process in the URL collection that has been captured is very time-consuming, and the decision vector with a large number of series will also occupy more memory

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0049] The main realization principle, specific implementation process and beneficial effects of the present invention will be described in detail below with reference to each accompanying drawing.

[0050] Please refer to Fig. 1, which is a flow chart of the main realization principle of the method for grabbing webpage identifiers of the present invention, wherein in the realization process of the method of the present invention, at least one first storage structure needs to be set in advance to store a specified number of the most recent Preferably, the hash value of the captured web page identification is required to set the first storage structure of this setting in a storage medium with a higher processing speed and a higher price, such as a storage medium such as a memory; at the same time, it is necessary to set at least one The second storage structure is used to store all the hash values of webpage identifications that have been crawled, wherein the set second storag...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a net page identification grabbing method that includes setting the first storing structure which is used to store the net page identification hash value of definite number newest grabbing; the second storing structure which is used to store the net page identification hash value of grabbed, the second storing structure includes original sub storing structure and conflict-avoiding sub storing structure which is corresponding with every node in the original sub storing structure; thereinto, the net page identification hash value of conflict in original sub storing structure can be resolved by the conflict-avoiding sub storing structure. The invention can increase the speed of judging whether the grabbing net page is in the net page identification set, and decrease the superabundant occupation of memory resource during the grabbing cause.

Description

technical field [0001] The invention relates to the technical field of search engines in Internet systems, in particular to a method for grabbing webpage identifiers. Background technique [0002] Search engine technology is a very popular network search technology in recent years. Web search, news search, music search, picture search and map search technologies based on it have great practical value and commercial value respectively. Among them, the crawler subsystem (Crawler, refers to the subsystem in the search engine system responsible for grabbing raw Internet data resources) is a very important part of the search engine system, and its role is to provide the search engine system with the most original source of Internet data. , such as providing web pages, mp3, pictures, emails, documents or software resources, etc., to greatly expand the application of search engine technology in various occasions. Among them, a well-designed and well-structured crawler is the prere...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

Inventor杨卫

OwnerTENCENT TECH (SHENZHEN) CO LTD

Webpage mark extracting method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology