Automatic optimized crawler grab method

An automatic optimization and crawler technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of reducing system performance, increasing resource consumption, and high frequency of information capture, so as to optimize system resources, improve efficiency, The effect of improving system performance

Active Publication Date: 2008-05-28
北京酷讯科技有限公司
View PDF0 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, the above method of using crawlers to grab online information is still an information grabbing method based on an ideal state. In practical applications, the crawler's crawling efficiency cannot be maximized, because the release of new information often has It is extremely time-sensitive, and the release is more concentrated in one time period, and it is relatively quiet at other times. For example, the annual peak sales of train tickets, air tickets and long-distance bus tickets are winter and summer vacations and Golden Week. The peak is a period of time before and after the graduates of colleges and univers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic optimized crawler grab method
  • Automatic optimized crawler grab method
  • Automatic optimized crawler grab method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0014] The present invention will be further described in detail below in conjunction with the drawings:

[0015] Kuxun’s crawler scheduling algorithm uses several factors such as whether the index page is downloaded successfully, whether the size changes, the time page information symbol requirements, whether there is a valid information link, the number of valid information being crawled, and the crawling time. Calculate the refresh rate. This method mainly corrects the frequency of information capture in the computer system according to the following formula.

[0016] freq ( n , ch , t ) = fCH ( ( αK down ( 1 - ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a grasping means for automatically optimizing a creeper. The prior creeper grasping web page means uses the same frequency to grasp issued peak information and issued trough information, thereby influencing the timeliness for grasping the information, reducing the efficiency of a system and increasing the pointless resource consumption. In order to solve the problem, the invention includes the following steps: firstly, the information is cramped out of an information page grasped from the Internet, if the cramping-out is successful, then the frequency for cramping out the information page again is quickened; or else, the frequency for cramping out the information page again is slowered; secondly, the step one is repeated when the amended frequency is reached. The invention is applicable for the various prior search engines.

Description

technical field [0001] The invention relates to a method for grabbing information by a web crawler, in particular to a method for a search engine to grab information by using crawler technology and automatically optimize the grabbing frequency. Background technique [0002] Search engine is a technology that is widely used on the Internet today. People only need to input some keywords of the information they are looking for to find a large amount of information related to this keyword through search engines, such as Google and Baidu. [0003] There are various sources of information for search engines, some of which pay advertising fees to the search engine operator from the advertiser who initiates the advertisement in the form of bidding advertisement, and then the search engine operator publishes the brief information and information of the advertisement in its own search engine. The link of the advertisement, and more non-advertising information, such as news and academi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 陈华
Owner 北京酷讯科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products