Lossy index compression

A technology of indexing and inverted indexing, which is applied in the field of establishing search indexes and can solve problems such as difficulty in maintaining index files

Inactive Publication Date: 2004-03-17
IBM CORP
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Indexing large document collections creates huge index files that are difficult to maintain

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Lossy index compression
  • Lossy index compression
  • Lossy index compression

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] figure 1is a schematic illustration of a system for building a compressed search index according to a preferred embodiment of the present invention. A user 10 uses an indexing device 12 to access a document archive 14 from which documents retrieved can be combined with existing document archives on the device 12 . Device 12 builds compressed inverted index 22 using methods described in detail below. Typically, compressed index or archive 22 is transmitted to computing device 24 . Device 24 differs from device 12 in that it has a limited ability to store a large index, preferably the archive of documents used for indexing is also transferred to device 24 . The user can then use the device 24 to formulate a query into the document archive and retrieve a listing of appropriate documents, despite the limited storage capacity of the device 24.

[0035] Typically, device 12 comprises a desktop computer or server, while device 24 is a portable pervasive computing device, su...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An apparatus is provided for performing a method (Fig. 2) for pruning an index of a corpus of text documents, wherein the method includes steps for ranking (50) the postings in the index and pruning (48) from the index the postings below a given level in the ranking. The pruning methods of the invention are lossy, since some document postings are removed from the full index; however, the user cannot differentiate the lossy index from the full index.

Description

technical field [0001] The present invention relates generally to methods and systems for computerized searching in large bodies of textual data, and in particular to building search indexes. Background technique [0002] Fast and accurate text search engines are widely used in web and desktop applications. Emerging handheld devices, such as the Palm PilotTM, have sufficient storage capacity to allow entire collections of moderately sized documents to be stored on the device for quick reference and browsing. It is desirable to equip these devices with advanced index-based search engines, but storage capacity on handheld devices is still rather limited. [0003] Most advanced information retrieval (IR) applications create inverted indexes to support high-quality search services for a given set of documents. An example of such a system is the Guru search engine, presented by Maarek and Smadja in "Full-Text Indexing Based on Lexical Relations, An Application: A Software Libra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F17/30622G06F17/30631G06F16/328G06F16/319
Inventor D·卡梅尔D·科亨R·费金E·法尔基M.赫尔什科维奇Y·马雷克A·索弗
Owner IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products