Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and system of web page link library updating

A web link and update method technology, applied in the Internet field, can solve the problems of long update time and low update efficiency of web link library, and achieve the effect of improving update efficiency

Inactive Publication Date: 2013-05-15
SHENGLE INFORMATION TECH SHANGHAI
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the present invention provides a method and system for updating a web page link library to overcome the problem of low update efficiency of the web page link library in the prior art due to too long update time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system of web page link library updating
  • Method and system of web page link library updating
  • Method and system of web page link library updating

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] Please refer to the attached figure 1 , which discloses a flowchart of the first method for updating a web page link library according to an embodiment of the present invention. In this embodiment, each link in the web page link library is sorted according to a corresponding order of crawling.

[0028] The web page link library in the prior art includes fixed-length files and variable-length files, each link is stored in the variable-length file, and the grabbing status, position and length of each link in the variable-length file are stored in the fixed-length file. in the file. The variable-length file stores variable-length information such as links. The fixed-length file is composed of class or structure objects. For example, if the structure defined for the fixed-length file is ClinkData, then the fixed-length file is composed of one ClinkData object followed by one ClinkData object. If some link information needs to be added, a parameter can be directly added to...

Embodiment 2

[0051] see figure 2 , which is a flow chart of the second method for updating a webpage link library disclosed in an embodiment of the present invention, each link in the webpage link library in this embodiment is sorted according to the corresponding crawling order, and the method may include:

[0052] Step S201: Obtain the link to be updated including the initial link and the new link;

[0053] Step S202: Mapping the initial link of the webpage in the webpage link library and the initial crawling state into the memory;

[0054] In fact, the fixed-length file in the web page link library is mapped into the memory, and after being mapped into the memory, the content of the file in the memory is the same as that of the fixed-length file. The purpose of doing this is that only when the fixed-length file is mapped into the memory, can each link object in the fixed-length file be operated like an array, and when the initial link in the fixed-length file is updated, there will be...

Embodiment 3

[0081] see image 3 , is a structural schematic diagram of the first web page link library update system disclosed in the embodiment of the present invention, each link in the web page link library in the system is sorted according to the order of capture, and the system may include: an acquisition module 301, a judgment Module 302, the first update module 303 and the second update module 304, wherein:

[0082] The obtaining module 301 is configured to obtain a link to be updated including an initial link and a new link;

[0083] The judging module 302 is configured to judge whether the link to be updated belongs to the webpage link library;

[0084] Specifically, if the crawling order variable type is a static variable, then after a link object is generated, the value of the grabbing order variable of the object is zero; if the links in the webpage link library are arranged in a positive order If the variable value of the crawl order of the smallest link is 1, then the judg...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a system of web page link library updating, and all the links in a web page link library are sorted according to a corresponding grabbing order. The method includes: A, obtaining links to be updated including an initial link and a new link; B, judging whether the links to be updated belong to the web page link library or not, if the links to be updated belong to the web page link library, entering into step C, and if not, entering into step D; C, according to the current grabbing state of the links to be updated, updating the initial grabbing state of the links which are provided with the grabbing order in the web page link library; and D, according to the grabbing order corresponding to the links to be updated, updating the links to be updated to the back of an existing link order in the web page link library. Thus the efficiency of the web page link library updating is improved.

Description

technical field [0001] The present invention relates to the field of the Internet, and more specifically relates to a method and system for updating a web page link library. Background technique [0002] With the rapid development of the Internet, the number of web pages on the Internet is becoming larger and larger, and the collection of Internet web pages is becoming more and more important. The existing web crawling method starts from some initial link collections and grabs the original web pages of these initial links. And extract the new link on the original webpage, and grab the webpage pointed by the new link, so that iteratively crawls webpages on the Internet. [0003] The method for webpage crawling is to realize webpage crawling based on a webpage link library. Each link and the information of each link are stored in the web page link library, and the information includes the crawling status corresponding to each link, the specific position of each link in the we...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 陈华清于志伟吕晴
Owner SHENGLE INFORMATION TECH SHANGHAI