Capturing method and capturing device for timeliness seed page

A crawling device and time-sensitive technology, applied in the Internet field, can solve problems such as impoliteness, meaningless crawling, loss of new links, etc., and achieve the effect of balancing relationships, reducing crawling, and high accuracy

Inactive Publication Date: 2014-03-12
BEIJING QIHOO TECH CO LTD +1
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Failure to spot their changes in time can result in new links being lost
But every time you check a page, you need to initiate a crawl. If the search engine keeps checking this kind of page, it will generate a lot of crawls for the website to which it belongs.
This large amount of crawling may n

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Capturing method and capturing device for timeliness seed page
  • Capturing method and capturing device for timeliness seed page
  • Capturing method and capturing device for timeliness seed page

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0030] figure 1 A flow chart of a method for capturing time-sensitive seed pages according to an embodiment of the present invention is shown. like figure 1 As shown, the method includes the following steps:

[0031] Step S110, for a time-sensitive seed page, obtain a frequency adjustment factor corresponding to each crawling initiated to the seed page within the current preset time period.

[0032] A seed page refers to a page ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a capturing method and a capturing device for a timeliness seed page. The capturing method comprises the following steps: acquiring a frequency adjustment factor which corresponds to each capturing initiated to the seed page within a current preset time period for the timeliness seed page; determining the capturing frequency of a historical preset time period which corresponds to the current preset time period; calculating frequency adjustment coefficients based on the frequency adjustment factors; dynamically adjusting the capturing frequency of the current preset time period according to the capturing frequency and the frequency adjustment coefficients. According to the capturing method and the capturing device for the timeliness seed page, disclosed by the invention, the capturing frequency can be dynamically adjusted, and therefore, unnecessary capturing of the seed page is reduced, and new links also can be discovered in time without loss.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method and device for capturing time-sensitive seed pages. Background technique [0002] The Internet is always generating new content, such as news, various popular discussions, and so on. These new contents are scattered in different corners of the Internet. In order to retrieve them in time, search engines need to find and grab them from the vast Internet in time. Fortunately, links to time-sensitive content almost always appear on a specific type of page, which is called a time-sensitive seed page (hub page for short), such as http: / / news.sina.com.cn / . So in theory, you only need to find these hub pages, and then check their changes in time to find all the time-sensitive links. [0003] The content of the hub page is constantly changing, and new links are likely to disappear after a period of time. Like the layout of the forum, the scrolling is very fast, and the ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/95
Inventor 魏少俊
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products