Method and device for grabbing timeliness seed page

A time-sensitive and seed-based technology, applied in the Internet field, can solve problems such as meaningless crawling, loss of new links, impoliteness, etc., and achieve the effect of reducing crawling, high accuracy, and balancing relationships

Inactive Publication Date: 2014-03-05
BEIJING QIHOO TECH CO LTD +1
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Failure to spot their changes in time can result in new links being lost
But every time you check a page, you need to initiate a crawl. If the search engine keeps checking this kind of page, it will generate a lot of crawls for the website to which it belongs.
This large amount of crawling may not be able to discover new links in many cases, resulting in a large number of meaningless crawls, and it is also a very impolite behavior, which may even trigger the ban of search engine crawlers on the website, resulting in the crawler being unable to link for a period of time. visit website
Due to the obvious difference in the number of people accessing the Internet during holidays and working days, the amount of information generated on the Internet is also significantly different, so if the search engine keeps the same frequency of crawling during holidays and working days, some new links will not be crawled or cause meaningless crawling. Pick

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for grabbing timeliness seed page
  • Method and device for grabbing timeliness seed page
  • Method and device for grabbing timeliness seed page

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0036] figure 1 A flow chart of a method for capturing time-sensitive seed pages according to an embodiment of the present invention is shown. Such as figure 1 As shown, the method includes the following steps:

[0037] Step S110 , for a time-sensitive seed page, obtain a frequency adjustment factor corresponding to each crawl initiated to the seed page within the current preset time period, and calculate a frequency adjustment c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for grabbing a timeliness seed page. The method comprises the steps that for the timeliness seed page, a frequency regulation factor corresponding to grabbing conducted on the seed page every time within the current preset time period is obtained; a frequency regulation coefficient is calculated based on the frequency regulation factors; the grabbing frequency of the historical preset time period corresponding to the current preset time period is determined; whether the current preset time period belongs to the switching time between holidays and working days or not is judged, and if yes, the holiday factor is determined according to the switching situation of the holidays and working days; the grabbing frequency within the current preset time period is dynamically regulated according to the grabbing frequency, the frequency regulation coefficient and the holiday factor. According to the method and device for grabbing the timeliness seed page, the holiday factor and the grabbing frequency can be dynamically regulated, unnecessary grabbing rate of the seed page is reduced, and new links can be found timely and will not be lost.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method and device for capturing time-sensitive seed pages. Background technique [0002] The Internet is always generating new content, such as news, various popular discussions, and so on. These new contents are scattered in different corners of the Internet. In order to retrieve them in time, search engines need to find and grab them from the vast Internet in time. Fortunately, links to time-sensitive content almost always appear on a specific type of page, which is called a time-sensitive seed page (hub page for short), such as http: / / news.sina.com.cn / . So in theory, you only need to find these hub pages, and then check their changes in time to find all the time-sensitive links. [0003] The content of the hub page is constantly changing, and new links are likely to disappear after a period of time. Like the layout of the forum, the scrolling is very fast, and the ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9537
Inventor 魏少俊
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products