Web page collecting method and web page collecting server

A web crawling and server technology, applied in the field of information processing, can solve the problems of web content update, long update cycle, occupation of network bandwidth resources, etc., to achieve the effect of improving efficiency, reducing burden and reducing occupation

Active Publication Date: 2008-05-14
TENCENT TECH (SHENZHEN) CO LTD
View PDF0 Cites 57 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] The prior art only sets a time threshold for webpage crawling intervals for all types of webpages, which cannot flexibly adapt to the updates of different types of webpages. Suppose that if the time threshold is set to 10 minutes, then for some frequently updated webpages, such as forums, , Comments and other webpages, the 10-minute crawl interval is too long; on the contrary, for those types of webpages with very low update frequency, such as news webpages, it is likely that they will not be updated after they are published, but the current system cannot adapt to this However, it is still necessary to re-crawl the webpage every 10 minutes
When the crawling interval of a web page exceeds the set time threshold, that is, after the web page expires from the cache, it does not mean that the content of the web page has been updated and needs to be re-crawled. However, in fact, most web pages on the Internet are updated The cycle is relatively long
[0011] Therefore, the wireless search webpage conversion system of the prior art cannot adapt to the long this situation of the webpage update cycle, resulting in repeatedly grabbing a lot of webpages that have not been updated, which increases the burden on the webpage grabbing server and takes up too much Network bandwidth resources, and the efficiency of crawling web pages is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page collecting method and web page collecting server
  • Web page collecting method and web page collecting server

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be further described in detail below through specific embodiments and accompanying drawings.

[0042]The web page grabbing method of the present invention is applicable to the web page grabbing server in the wireless search webpage conversion system, and the webpage grabbing server utilizes a cache mechanism to ensure that the same HTML webpage is not repeatedly grabbed within a certain time range, and at the same time , when the predetermined time threshold is reached, it is detected whether the content of the HTML webpage has been updated according to the HTTP header information to determine whether the HTML webpage needs to be crawled again. When it is necessary to grab an HTML webpage, the webpage grabbing server grabs the HTML webpage from the server where the HTML webpage is located, and sends the grabbed HTML webpage to the conversion server in the wireless search webpage conversion system, which is converted into The WML webpage is sto...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a web page snatching method and web page snatching server. The method comprises: A. the method receives web page request; B. the method estimates whether the requested web page is snatched, executes step C if yes, otherwise, snatches the web page and ends the flow; C. the invention estimates whether the snatching time of the requested web page is bigger than the presetting time threshold value and executes step D if yes,. Otherwise, does not snatch the web page and end the flow; D. the invention searches whether the web page is updated and snatches the web page if yes, otherwise does not snatch the web page. The server comprises: a web page request receiving module, an estimation module, a searching module and a snatching module. The invention can lighten the burden for the web page snatching server, reduces the occupation to the network band width material and elevates the efficiency of the web page snatching.

Description

technical field [0001] The invention relates to the technical field of information processing, in particular to a webpage grabbing method and a webpage grabbing server in a wireless search webpage converting system. Background technique [0002] With the development of network technology, wireless Internet technology is also developing rapidly. People can connect with others anytime and anywhere through mobile communication terminals (such as mobile phones, wireless handheld computers, etc.). The promotion of wireless Internet will have great development and change our way of life. [0003] Currently, the most resources on the Internet are web pages, but these web pages are in Hypertext Markup Language (HTML, HyperText Markup Language) format designed for personal computers (PCs). , these webpages cannot be browsed directly on mobile communication terminals. In view of this situation, a markup language in the form of Wireless Markup Language (WML, Wireless Markup Language) ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 王为纪宇
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products