Unlock instant, AI-driven research and patent intelligence for your innovation.

Optimizing web crawling with user history

A historical and browsing history technology, applied in special data processing applications, instruments, network data indexing, etc., can solve problems such as not allowing adjustment

Active Publication Date: 2014-04-09
MICROSOFT TECH LICENSING LLC
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Both crawl rate and latency are predetermined static values ​​which therefore do not allow adjustments based on site traffic

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optimizing web crawling with user history
  • Optimizing web crawling with user history
  • Optimizing web crawling with user history

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0013] The subject matter described herein is specifically presented to meet statutory requirements. However, the description herein is not intended to limit the scope of this patent. Alternatively, the claimed subject matter may also be embodied in other ways, so as to include different steps or combinations of steps similar to those described in this document, in conjunction with other present or future technologies.

[0014] As used herein, a "web site" refers to web pages, web blogs, online videos, online images, online videos, and various other content that may be accessible over a network. To aid the readability of the description herein, "web site" and "site" are used interchangeably. Those skilled in the art will appreciate that web crawlers can be configured to analyze and interpret text and / or metadata on web sites in order to understand online content. For this purpose, the text may be judged or weighted based on the text's underlying definition, placement on the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A politeness manager estimates traffic to the sites based on historical log data generated and sent by plug-ins or toolbars on client web browsers. The historical log data details dates and times the web browsers visit different web sites that is used to understand what timeframes specific web sites are busy and what timeframes the web sites are not busy. Crawl rates for different timeframes for a web site are determined based on the historical log data, and web crawlers are scheduled to crawl the web site according to the crawl rates to minimize the chances that web crawler requests are responsible for the site crashing.

Description

Background technique [0001] Search engines use web crawlers to understand documents on the World Wide Web ("the web"). Web crawlers are programs that persistently search the web, indexing web sites by their content (eg, keywords, text, reciprocal links, video, images, audio, etc.). Because web sites are constantly changing, web crawlers must repeatedly crawl the site in order to index the freshest content. However, repeatedly visiting a web site poses a problem for the owner of the site because the server hosting the site may only be able to serve a certain number of users / requesters at the same time. So crawling a site during periods of peak traffic (eg, a site for stocks traded near the opening bell for a particular stock trade) becomes dangerous to the stability of the site. Balancing the need to index fresh content against the volatile nature of a site's traffic is a difficult task for modern web crawlers. [0002] The traditional way that site owners have sought to con...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/00G06F17/30
CPCG06F17/30864G06F16/951H04L67/1001H04L67/535H04L67/62
Inventor D.M.维尔曼F.卡内尔B.什亚姆库马C.(X.)张
Owner MICROSOFT TECH LICENSING LLC