Generic character encoding and decoding method and system

An encoding algorithm and encoding technology, which is applied in the field of encoding and decoding, can solve the problems of small custom space and cannot meet the needs of custom mixed binary, and achieve the effects of strong versatility, good space efficiency, and space saving

Pending Publication Date: 2020-07-03
薛昌熵
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the customization space in the Unicode private area is very small, which cannot meet the needs of custom mixed binary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Generic character encoding and decoding method and system
  • Generic character encoding and decoding method and system
  • Generic character encoding and decoding method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] Such as figure 1 As shown, the implementation of the present invention provides a pan-text character encoding method, including:

[0024] 100. According to the character code point, decompose the area code, language size, and font size. details as follows:

[0025] 101. If the character to be encoded is ASCII, it is classified into the single-byte area, and the code element is the ASCII value. For example, the binary form of "A" is 01000001, and it is output as it is.

[0026] 102. If the character to be encoded is a common character, it is classified into the double-byte area, and the font size corresponds to the code point one by one. The font size is inserted into the space of 1xxxxxxx 0xxxxxxx to obtain the code element. For example, "的" is a commonly used Chinese character, the font size is 0, and the code is 1010000000000000.

[0027] 103. If the character to be coded is a rare word, it is classified into a three-byte area, and the font size is the code point, which co...

Embodiment 2

[0033] Such as figure 2 As shown, the implementation of the present invention provides a pan-text character decoding method, including:

[0034] 200. In the coded byte sequence, the first bit of the byte is 0 as the end byte of the symbol, and the symbol is divided by this. Do different processing according to the length of the code element. For example, 01000001 10100000 00000000 10000010 1100000000100111 1000000010000000 10000010 00000010 can be divided into 01000001, 10100000 00000000, 10000010 11000000 00100111, 10000000 10000000 1000001000000010 according to the first 0 of the tail byte, and the length is 1, 2, 3, and 4 respectively.

[0035] 201. If the code element is a byte, it is a single-byte area, and its value is the ASCII value. For example, 01000001 is A.

[0036] 202. If the code element is two bytes, it is a double-byte area, and its value is mapped to the character code point one by one, like 1xxxxxxx, where the x part is the data bit, which is bijected to the cha...

Embodiment 3

[0045] Such as image 3 As shown, the implementation of the present invention provides a pan-text character encoding system, including:

[0046] 301. The decomposition module, including a decomposer, uses the decomposer to decompose the area code, symbol, and font size of the character sequence or binary sequence to be encoded verbatim;

[0047] 302. Synthesis module, including single-byte synthesizer, double-byte synthesizer, three-byte synthesizer, four-section three-word synthesizer, four-section double-word synthesizer, and four-section binary synthesizer. Synthesize code elements according to the area code, language number, and font size. If the area code and language number of the preceding and following words are the same, and there is a space in the previous code element, the next font size can be pressed into the space of the previous symbol. The byte at the end of the code element uses the first bit 0, and the remaining first bit 1 is used as a separator, which is spliced...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a generic text character encoding and decoding method and system. The method mainly comprises the following steps: self-synchronizing, and taking the first bit of a tail byte as 0 as a code element separator; during encoding, character code points are split into area codes, phrase codes and word codes, byte lengths and structures are divided according to the area codes, thephrase codes are used as prefixes, and the word codes are used as offsets to form a code element sequence; during decoding, after the sequence is segmented according to code elements, the area code,the phonetic symbol and the word size are read out in sequence, and then characters are formed. One code element can store one or more characters, the same code element shares one area code and one phrase, one or more word sizes are stored, and the one or more word sizes correspond to one or more characters. Wherein the binary system area stores binary systems in a user-defined mode so as to storethe binary systems, instructions and new languages, and isolates tokens and characters. The invention relates to the field of data storage and transmission, and through the method, space-saving, efficient and safe storage of generic texts is realized, and the method is suitable for various distribution conditions such as single-row mixed global various characters, mixed binary texts, custom characters and the like.

Description

Technical field [0001] The invention relates to the field of coding and decoding, in particular to a method and system for coding and decoding of pan-text characters. Background technique [0002] Computers use binary storage for data, and characters also need to be converted into binary storage. Character encoding is to develop computer encoding for the included character set. Typical character sets are ASCII, GB2312, Unicode. Typical encoding methods are ASCII, GB2312, GB18030, UTF16, and UTF8. Usually the character set is used in conjunction with the encoding method. Unicode attempts to include all characters and is currently the most popular cross-language character set. [0003] The above-mentioned encoding or recording characters are few, or defects such as wide characters, cost space, or do not support encoding binary. Characters are often mixed with "\n", "\0" and other instruction escapes, which are neither semantically specific nor isolated and safe. At present, the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/126
Inventor 薛昌熵
Owner 薛昌熵
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products