Unlock instant, AI-driven research and patent intelligence for your innovation.

Efficient freshness crawl scheduling

A technology of crawlers and content items, applied in the direction of network data indexing, instrumentation, and other database retrieval

Pending Publication Date: 2021-12-28
MICROSOFT TECH LICENSING LLC
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As the web becomes more dynamic, services that rely on web data face the increasingly challenging problem of keeping up with content changes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient freshness crawl scheduling
  • Efficient freshness crawl scheduling
  • Efficient freshness crawl scheduling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The subject matter of the present disclosure is described with specificity herein to satisfy statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter may also be implemented for other cost functions in other ways, to include different steps or combinations of steps similar to those described in this document, in conjunction with other prior or future technologies. Moreover, although the terms "step" and / or "block" may be used herein to denote different elements of a method employed, these terms should not be construed to imply any process within or between the various steps disclosed herein. specific order unless and otherwise the order of the individual steps is explicitly described. Each of the methods described herein may include computational procedures that may be performed using any combination of hardware, firmware, and / or software. For exam...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The technology described herein builds an optimal refresh schedule by minimizing a cost function constrained by an available refresh bandwidth. The cost function receives an importance score for a content item and a change rate for the content item as input in order to optimize the schedule. The cost function is considered optimized when a refresh schedule is found that minimizes the cost while using the available bandwidth and no more. The technology can build an optimized schedule to refresh content with incomplete change data, content with complete change data, or a mixture of content with and without complete change data. It can also re-learn content item change rates from its own schedule execution history and re-compute the refresh schedule, ensuring that this schedule takes into account the latest trends in content item updates.

Description

Background technique [0001] A web crawler is a typical part of a search engine, which obtains information that is then provided by the search service to its users. As the web becomes more dynamic, in addition to discovering new web pages, crawlers need to constantly revisit those web pages already in the search engine index in order to keep the index fresh by picking up content that changes on the page. This refresh process is resource intensive. Contents of the invention [0002] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. [0003] The techniques described herein provide content trackers, such as search engines, with a more efficient crawl schedule to use when refreshing co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951
CPCG06F16/951G06F9/4881
Inventor A·科洛博维吕成E·J·霍维茨Y·佩雷斯
Owner MICROSOFT TECH LICENSING LLC