Tibetan text compression algorithm

A text compression and Tibetan language technology, applied in electrical components, code conversion, etc., can solve problems such as unfavorable text compression of TiCA algorithm

Active Publication Date: 2020-11-27
TIBET UNIV
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

So the existing TiCA algorithm is not conducive to text compression

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Tibetan text compression algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention is described in detail by the following examples. It is necessary to point out that the following examples are only used to further illustrate the present invention, and cannot be interpreted as limiting the protection scope of the invention. Some non-essential improvements and adjustments are made to the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0024] The core idea of ​​the TiCA Tibetan text compression algorithm is to map the original 1 to 7 UTF-8 codes in a Tibetan text into a code through the mapping dictionary, thereby achieving the purpose of text compression. Therefore, the design of the mapping dictionary is crucial. If the mapping dictionary for the Tibetan text compression algorithm is only formulated based on modern Tibetan dictionaries, there will ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Tibetan text compression algorithm, which is characterized by performing statistical analysis on Tibetan text in a 20G Tibetan webpage, establishing a mapping dictionary of aTiCA algorithm, and mapping the Tibetan text consisting of codes into a coding method to compress the Tibetan text according to the mapping dictionary. In this process, the text to be compressed needs to be scanned, the filtered Tibetan text is compressed according to an initial position interval of each section of Tibetan, each Tibetan position interval screened in the step 1 is traversed, Tibetan character strings in each interval are segmented into Tibetan characters through utilizing syllable symbols of Tibetan, and finally the Tibetan characters formed by one or more Tibetan components are mapped into codes, thereby completing compression of the Tibetan text. According to the dictionary, the Tibetan text compression algorithm TiCA provided by the invention is completed, and the robustness of the TiCA algorithm is improved. Experiments prove that an excellent effect is achieved no matter in the aspect of compression ratio or in the aspect of time consumption.

Description

technical field [0001] The present invention relates to the compression field of Tibetan texts, in particular, it relates to a Tibetan text compression algorithm. Background technique [0002] At present, international and domestic researches on text compression have made great achievements. There have been LZ series algorithms based on dictionary coding and compression algorithms based on arithmetic coding. Since text data must be accurately reconstructed, only Huffman coding, arithmetic coding, and run-length Encoding, LZ encoding and other lossless compression algorithms. [0003] The generalized text compression algorithms mainly include improved related algorithms such as LZ algorithm and LZW algorithm. These algorithms are very mature in the text compression technology of common languages ​​such as English and Chinese, but Tibetan text compression is less researched in this field of work. According to the investigation and research of the present invention, it is foun...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H03M7/30
CPCH03M7/3059
Inventor 索南尖措尼玛扎西仁青诺布格桑多吉普布旦增
Owner TIBET UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products