Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Lempel-Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases

a data compression and dictionary technology, applied in the field of textual data compression, can solve the problems of not taking advantage of the fact that the similarity of texts is greater, failing to teach the identification of the most appropriate library of text or the genre of the document, and achieving the effect of improving compression and benefi

Inactive Publication Date: 2010-02-23
PINPOINT
View PDF41 Cites 63 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0038]While most extremely frequent words are short, the present inventors recognize that knowledge of the type of text being compressed can lead to larger benefit from a pre-filled dictionary. If, for instance, scientific text is being compressed, then samples of similar texts can be used to get a profile of the type of text being compressed. In scientific texts, some long phrases may be extremely common. These phrases would be learned by the compression software and would be used when the first reference to them was made in the data being compressed. So, if, for instance, a length 10 word or phrase were common, then even the first occurrence of that word could be encoded using a single dictionary reference, whereas under conventional LZ78, such an encoding might not be possible until as late as the eleventh occurrence of the word. The dictionary will also be prevented from containing substrings of this frequent word or phrase as would be the case in LZ78 unless such substrings are independently added during the compression process because they are useful outside the context of the frequent word or phrase.

Problems solved by technology

However, Giltner et al. do not address frequently occurring sequences of text which fall outside the limited definition of a valid word.
As a result, Giltner et al. do not take advantage of the fact that the similarity of texts is greater when the comparison between them is made at the level of character sequences.
Also, Giltner et al. do not address how words which occur frequently in the text can be chosen for the first dictionary whereby it is filled with valid words which are frequent within the type of text being transmitted.
Giltner et al. also fail to teach how to identify the most appropriate library of text or the identification of the genre of the document to be compressed.
In their version, the dictionary is filled in advance with all strings of length 1 (that is, all of the characters in the alphabet over which compression is taking place), which helps to reduce, but does not eliminate the problem of starting with a dictionary devoid of useful entries.
However, this would result in no compression.
As a result, “gzip” does not always find the longest possible match but generally finds a match which is long enough.
Unfortunately, despite the large number of Lempel-Ziv variants in the prior art, none adequately addresses the problem that beginning with a dictionary completely devoid of words virtually prevents small files from being compressed at all and prevents larger files from being compressed further.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Lempel-Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases
  • Lempel-Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases
  • Lempel-Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059]The present invention will be described in detail below with respect to FIGS. 4-14. Those skilled in the art will appreciate that the description given herein is for explanatory purposes only and is not intended to limit the scope of the invention. Accordingly, the scope of the invention is only to be limited by the scope of the appended claims.

[0060]The extension to the Lempel-Ziv algorithms in accordance with the invention can be used in conjunction with all of the known variants of the Lempel-Ziv compression techniques described in the patent literature as well as in the text compression literature. However, in presently preferred embodiments, the present invention is used as a modification to the LZ77 or LZ78 compression techniques or as a modification to a Lempel-Ziv variant which is itself a modification to or extension of the LZ77 or LZ78 techniques. The extension to the Lempel-Ziv algorithms in accordance with the invention is preferably implemented as part of a softwa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

An adaptive compression technique which is an improvement to Lempel-Ziv (LZ) compression techniques, both as applied for purposes of reducing required storage space and for reducing the transmission time associated with transferring data from point to point. Pre-filled compression dictionaries are utilized to address the problem with prior Lempel-Ziv techniques in which the compression software starts with an empty compression dictionary, whereby little compression is achieved until the dictionary has been filled with sequences common in the data being compressed. In accordance with the invention, the compression dictionary is pre-filled, prior to the beginning of the data compression, with letter sequences, words and / or phrases frequent in the domain from which the data being compressed is drawn. The letter sequences, words, and / or phrases used in the pre-filled compression dictionary may be determined by statistically sampling text data from the same genre of text. Multiple pre-filled dictionaries may be utilized by the compression software at the beginning of the compression process, where the most appropriate dictionary for maximum compression is identified and used to compress the current data. These modifications are made to any of the known Lempel-Ziv compression techniques based on the variants detailed in 1977 and 1978 articles by Ziv and Lempel.

Description

BACKGROUND OF THE INVENTION[0001]1. Field of the Invention[0002]The present invention relates to a method and apparatus for compressing and decompressing textual data stored in digital form in a lossless manner. In other words, the original data is reconstructed in its original form after having first undergone the compression and then the decompression processes. The data is assumed to be drawn from a particular alphabet which is specified in advance, such as the ASCII code, which consists of a 7 or 8 bit representation of a particular set of characters.[0003]2. Description of the Prior Art[0004]Many different types of text compression techniques are described in the prior art. The text compression techniques described herein are based on the text compression techniques developed by Lempel and Ziv, who developed two techniques for text compression which are similar but have important differences. These two methods were outlined in papers entitled “A Universal Algorithm for Sequenti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F7/00G06F5/00G06F15/00G06T9/00H03M7/00H03M7/30H03M7/40
CPCG06T9/005H03M7/3086H04N19/93H03M7/3088H04N19/13H04N19/91
Inventor REYNAR, JEFFREY C.HERZ, FREDEISNER, JASONUNGAR, LYLE
Owner PINPOINT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products