Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for compressing Chinese text supporting ANSI encode

A compression method and text technology, applied in code conversion, special data processing applications, instruments, etc., can solve the problems of reduced compression speed, poor compression performance, and inability to fully utilize semantic information, and achieve the effect of improving compression performance.

Inactive Publication Date: 2011-05-04
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among the above three compression methods for Chinese data streams, the first method directly converts Chinese data into single bytes, and compresses ANSI-encoded Chinese data by compressing ASCII-encoded data, because ANSI-encoded Chinese data is expressed in double bytes A Chinese character, alphanumeric, etc. is represented by a single byte, and its semantics are mostly represented by the relationship between double bytes or the relationship between single and double bytes, so this method physically separates the semantic information contained in the encoding , cannot make full use of the character-level semantic information contained in the Chinese data stream, and the compression performance is poor
The second method simply expands the byte-based compression algorithm to a character-based compression algorithm. When performing subsequent compression, such as when applied to Huffman coding, it changes the number of characters of Huffman coding from 8 bits to 256 characters. Extended to 16-bit 65536 characters, the number of nodes in the corresponding Huffman tree has increased by 256 times, which greatly reduces the compression speed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for compressing Chinese text supporting ANSI encode
  • Method for compressing Chinese text supporting ANSI encode
  • Method for compressing Chinese text supporting ANSI encode

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] In order to make the purpose, technical solution and advantages of the present invention clearer, the method for compressing Chinese text supporting ANSI encoding according to an embodiment of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0065] According to an embodiment of the present invention, the non-continuous variable-length coding is performed according to the frequency distribution of each character and phrase in the data stream, wherein the characters include ASCII code, extended ASCII code and Chinese character code in ANSI mode. Non-sequential encoding refers to the allocation of encodings with different integer byte lengths according to the number of characters and phrases in the data stream, so as to maintain its semantic characteristics to the maximum extent. When encoding the Chinese text to be compressed, the character code table and the phrase code table generated according to the descen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a compression method of Chinese text and a corresponding decompression method, which support ANSI codes. The compression method comprises the following steps: according to the position of the characters of the Chinese text to be compressed in the character code table, the characters of the Chinese text to be compressed are encodes by 1 or more than 1 bytes; the number of the bytes is marked in the generated code word; wherein, the character code table comprises the characters which are arranged in the descending order according to the frequency of occurrence of the characters of the Chinese text to be compressed; the generated code word is written into a compressed file. The compression method maintains the semantic characteristics of Chinese data flow to the greatest extent, and can be used with various compression algorithms and compression software.

Description

technical field [0001] The invention relates to the fields of data encoding and data compression, in particular to a method for compressing Chinese texts that support ANSI encoding. Background technique [0002] Data compression technology is the main technical means to save network bandwidth and storage resources and improve data transmission speed. Data compression refers to reorganizing relatively large original data into a data set that meets the aforementioned space requirements under a certain data storage space requirement, so that the information recovered from the data set can be consistent with the original data, or can be obtained. Use the same quality as the original data. Data can be compressed because of the redundancy in the data and the correlation between the information represented by the data. Data compression reduces the space required for data storage, thereby indirectly reducing the time and resource consumption required for data processing. [0003]...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): H03M7/30G06F17/22
Inventor 云晓春王树鹏罗浩常为领吴广君李书豪
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products