Estimation of postings list length in a search system using an approximation table

Inactive Publication Date: 2011-02-17
GLOBALSPEC
View PDF51 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0019]The present invention provides, in a first aspect, a method of minimizing accesses to secondary storage when searching an inverted index for a search term. The method comprises automatically obtaining a predetermined size of a posting list

Problems solved by technology

A large inverted index may not fit into a computer's main memory, requiring secondary storage, typically disk storage, to help store the posting file, lexicon, or both.
Each separate access to disk may

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Estimation of postings list length in a search system using an approximation table
  • Estimation of postings list length in a search system using an approximation table
  • Estimation of postings list length in a search system using an approximation table

Examples

Experimental program
Comparison scheme
Effect test

Example

[0031]The present invention approximates posting list size, preferably as a length in bytes, according to a term's document frequency. The approximate posting list size is preferably predetermined, and it covers, with high probability, the size of the associated posting list in secondary storage. Knowing the approximate size is useful for minimizing the number of accesses to secondary storage when reading a posting list. For example, if the approximate covering read size is several megabytes or less, a highly efficient strategy is to scoop up the whole posting list in a single access to secondary storage through a single read system call. If the approximate covering read size is larger than the largest available main memory input buffer, then the list can be read, for example, by filling the largest available input buffer several times using a single system call per buffer fill operation, and then doing one more partial read to pick up the remainder of the approximate covering read ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a method of minimizing accesses to secondary storage when searching an inverted index for a search term. The method comprises automatically obtaining a predetermined size of a posting list for the search term, the predetermined size based on document frequency for the search term, wherein the posting list is stored in secondary storage, and reading at least a portion of the posting list into memory based on the predetermined size. Corresponding computer system and program products are also provided.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application claims priority under 35 U.S.C. §119 to the following U.S. Provisional Applications, which are herein incorporated by reference in their entirety:[0002]Provisional Patent Application Ser. No. 61 / 233,411, by Flatland et al., entitled “ESTIMATION OF POSTINGS LIST LENGTH IN A SEARCH SYSTEM USING AN APPROXIMATION TABLE,” filed on Aug. 12, 2009; and[0003]Provisional Patent Application No. 61 / 233,420, by Flatland et al., entitled “EFFICIENT BUFFERED READING WITH A PLUG IN FOR INPUT BUFFER SIZE DETERMINATION,” filed on Aug. 12, 2009;[0004]Provisional Patent Application Ser. No. 61 / 233,427, by Flatland et al., entitled “SEGMENTING POSTINGS LIST READER,” filed on Aug. 12, 2009.[0005]This application contains subject matter which is related to the subject matter of the following applications, each of which is assigned to the same assignee as this application and filed on the same day as this application. Each of the below listed ap...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F17/30622G06F16/319
Inventor FLATLAND, STEINARDALTON, JEFF J.
Owner GLOBALSPEC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products