Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for code compression and decoding for word library

A technology of compression encoding and thesaurus, applied in the field of compression encoding and decoding of thesaurus, can solve the problems of the average code length of words, insufficient word compression rate, large amount of calculation, etc., to achieve simple algorithm, improve compression rate, and small amount of calculation. Effect

Inactive Publication Date: 2009-09-02
GUANGDONG GUOBI TECH
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Most of the traditional compression codes for lexicons use Huffman coding. Huffman coding constructs a Huffman tree according to the number of occurrences of letters in a word. The higher the number of occurrences of letters, the shorter the length of the binary code assigned. , so that the average code length of all the words in the lexicon is as short as possible, but the compression rate of Huffman coding for words is not enough. According to statistics, the compression rate of the English lexicon using Huffman coding is 48.84 %, the compression rate of the Russian thesaurus is 48.64%, the compression rate of the Turkish thesaurus is 51.68%, the compression rate of the Arabic thesaurus is 56.50%, and the compression rate of the Portuguese thesaurus is 46.45%. It can be seen that there is still room for improvement in the compression rate of various language lexicons; while using other Lzw compression algorithms has too much computation, we need a lexicon compression method with simple decoding and higher compression rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for code compression and decoding for word library
  • Method and system for code compression and decoding for word library
  • Method and system for code compression and decoding for word library

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] According to the character sequence database provided in advance (such as English vocabulary), the present invention calculates the frequency of occurrence of each character as the first character in each character sequence, and the frequency of occurrence of subsequent characters of each character, thereby generating a character frequency surface. This frequency table is sorted in descending order, that is, the serial number with high frequency is small, and the serial number with low frequency is large. Huffman coding is performed on several column frequencies obtained by adding the frequency of each column to generate a coding table. The word binary code obtained according to this coding table increases the repetition rate of letters, thereby improving the compression rate of the word library .

[0043]Below in conjunction with following examples the present invention will be further described, with reference to figure 1 , a method for compressing and encoding thes...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for code compression for a word library, which comprises the following steps that: A, a first frequency table is generated after various words in the word library are counted, and comprises a first letter frequency data group and a plurality of subsequent letter frequency data groups; B, each group of frequency data in the first frequency table is sorted according to the sequence, and frequency data of the same order location which groups of frequency data are positioned in are added to obtain a second frequency table comprising a plurality of sum frequencies; C, the sum frequencies are subjected to Huffman coding to obtain corresponding binary codes, and the obtained binary codes are allocated to the order location corresponding to each sum frequency in the second frequency table to generate a coding table; and D, letters of the words in the word library are substituted to generate binary coding corresponding to the words according to the binary codes corresponding to the order locations where first letters and various subsequent letters of each letter in the coding table are positioned. The invention also provides a system for code compression for the word library, and a method and a system for decoding codes in the word library. The invention improves the compression rate of word codes in the word library, and has simple decoding.

Description

technical field [0001] The invention relates to compression coding technology, in particular to a method and system for compressing coding and decoding thesaurus. Background technique [0002] Most of the traditional compression codes for lexicons use Huffman coding. Huffman coding constructs a Huffman tree according to the number of occurrences of letters in a word. The higher the number of occurrences of letters, the shorter the length of the binary code assigned. , so that the average code length of all the words in the lexicon is as short as possible, but the compression rate of Huffman coding for words is not enough. According to statistics, the compression rate of the English lexicon using Huffman coding is 48.84 %, the compression rate of the Russian thesaurus is 48.64%, the compression rate of the Turkish thesaurus is 51.68%, the compression rate of the Arabic thesaurus is 56.50%, and the compression rate of the Portuguese thesaurus is 46.45%. It can be seen that th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/22
CPCH03M7/40
Inventor 高精鍊陈炳辉刘志玭
Owner GUANGDONG GUOBI TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products