A method and apparatus for text compression
By combining and encoding to generate a dictionary, the text to be compressed is encoded and replaced, which solves the problem of low efficiency in Chinese character compression of the LZW compression method and achieves more efficient text compression.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA MOBILE GROUP JIANGSU
- Filing Date
- 2021-04-20
- Publication Date
- 2026-06-12
Smart Images

Figure CN115221843B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data compression technology, and in particular to a method and apparatus for text compression. Background Technology
[0002] LZW compression is a lossless compression method invented by Abraham Lempel, Jacob Ziv, and Terry Welch, based on a table lookup algorithm to compress files into smaller files. When applied to text compression, existing LZW compression methods require a table generator (dictionary) to create the table. The table generator continuously reads new characters from the original string and attempts to encode individual characters or strings as tokens. The table generator maintains two variables: P (Previous), representing existing strings that have not yet been encoded; and C (current), representing the newly read character. Since English consists of 26 letters, the initial state of the table contains relatively few P values, resulting in a relatively small encoding workload. However, the total number of Chinese characters exceeds 80,000, causing a large number of P values in the initial state of the table, leading to very low compression efficiency for Chinese characters. Therefore, it is not suitable for text compression. Although LZW compression has the ability to compress text data, its compression efficiency is low, and it cannot achieve significant text compression. Summary of the Invention
[0003] This invention provides a method and apparatus for text compression, which solves the technical problem of low text compression efficiency in the prior art and improves text compression speed and effect.
[0004] This invention provides a method for text compression, comprising:
[0005] The individual characters that appear more than twice in the text to be compressed are combined to obtain a character combination; the character combination is a combination of consecutive individual characters that appear more than twice in the text to be compressed.
[0006] Encode the individual characters and the combinations of characters to form corresponding character codes;
[0007] A dictionary is generated based on the individual characters, the combinations of characters, and the character encodings;
[0008] The text to be compressed is compressed according to the dictionary.
[0009] In one embodiment, encoding the individual characters and the character combinations to form corresponding character codes; and generating a dictionary based on the individual characters, the character combinations, and the character codes, includes:
[0010] The individual characters and combinations of characters are arranged in ascending order of character count;
[0011] Individual characters and combinations of characters with the same number of characters are arranged in ascending order of their frequency of occurrence.
[0012] Based on the arrangement order of the individual characters and the character combinations, the individual characters and the character combinations are encoded sequentially to form the corresponding character codes;
[0013] The dictionary is generated based on the individual characters, the combinations of characters, and the character encoding.
[0014] In one embodiment, compressing the text to be compressed according to the dictionary includes:
[0015] According to the dictionary, the corresponding characters in the text to be compressed are replaced with the character codes corresponding to the single characters and / or combinations of characters.
[0016] In one embodiment, the text compression method further includes:
[0017] If consecutive characters appearing in the text to be compressed can be replaced by consecutive combinations of the characters and / or the single characters, the character combination with the larger character code shall be used to replace the corresponding characters in the text to be compressed.
[0018] In one embodiment, the text compression method further includes:
[0019] If the total number of individual characters and character combinations exceeds a preset number, the individual characters and character combinations are deleted according to the specified order until the total number of individual characters and character combinations equals the preset number.
[0020] In one embodiment, the text encoding is a 16-bit binary text encoding, and is incremented starting from 0000 0000 0000 0001 according to the arrangement order.
[0021] This invention provides a text compression device, comprising:
[0022] The combination module is used to combine individual characters that appear more than twice in the text to be compressed to obtain a text combination; the text combination is a combination of consecutive individual characters that appear more than twice in the text to be compressed.
[0023] An encoding module is used to encode the single character and the combination of characters to form a corresponding character code; and to generate a dictionary based on the single character, the combination of characters and the character code.
[0024] A compression module is used to compress the text to be compressed according to the dictionary.
[0025] In one embodiment, the encoding module is used to sort the individual characters and the combinations of characters in ascending order according to the number of characters.
[0026] Individual characters and combinations of characters with the same number of characters are arranged in ascending order of their frequency of occurrence.
[0027] Based on the arrangement order of the individual characters and the character combinations, the individual characters and the character combinations are encoded sequentially to form the corresponding character codes;
[0028] The dictionary is generated based on the individual characters, the combinations of characters, and the character encoding.
[0029] The present invention provides an electronic device, including a memory and a memory storing a computer program, wherein the processor executes the program to implement the steps of any of the above-described text compression methods.
[0030] The present invention provides a processor-readable storage medium storing a computer program for causing the processor to perform the steps of any of the above-described text compression methods.
[0031] The present invention provides a method and apparatus for text compression, which can improve the text compression speed and effect by generating a dictionary of single characters and character combinations that appear more than twice and their corresponding codes, and then compressing the text to be compressed according to the dictionary. Attached Figure Description
[0032] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0033] Figure 1 This is a flowchart illustrating the text compression method provided by the present invention;
[0034] Figure 2 This is a schematic diagram of the text compression device provided by the present invention;
[0035] Figure 3 This is a schematic diagram of the physical structure of the electronic device provided by the present invention; Detailed Implementation
[0036] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0037] Figure 1 A schematic flowchart illustrating the text compression method provided by this invention. (Refer to...) Figure 1 The text compression method provided by this invention includes:
[0038] S110. Combine the individual characters that appear more than twice in the text to be compressed to obtain a character combination; the character combination is a combination of consecutive individual characters that appear more than twice in the text to be compressed.
[0039] S120. Encode the single character and the combination of characters to form a corresponding character code; generate a dictionary based on the single character, the combination of characters and the character code;
[0040] S130. Compress the text to be compressed according to the dictionary.
[0041] The text compression method provided by this invention can be implemented by an electronic device, a component within an electronic device, an integrated circuit, or a chip. The electronic device can be a mobile electronic device or a non-mobile electronic device. For example, a mobile electronic device can be a mobile phone, tablet computer, laptop computer, PDA, in-vehicle electronic device, wearable device, ultra-mobile personal computer (UMPC), netbook, or personal digital assistant (PDA), etc., while a non-mobile electronic device can be a server, network attached storage (NAS), personal computer (PC), television set (TV), ATM, or self-service machine, etc. This invention does not impose specific limitations.
[0042] The technical solution of this invention will be described in detail below using the method of text compression provided by this invention executed by a computer as an example.
[0043] It should be noted that the text to be compressed can be Chinese characters or text in other languages; this invention does not impose any specific limitations.
[0044] Optionally, in step S110, each character of the text to be compressed is first searched to find the set N1{n} of individual characters that appear more than twice. 11 n 12 n 13 ...n 1m};
[0045] Next, repeatedly recombine the individual characters that appear more than twice in the text to be compressed to obtain character combinations with different numbers of characters, where each character combination is a combination of consecutive individual characters that appear more than twice; for example, the set of character combinations with two characters appearing more than twice can be represented as N2{n 21 n 22 n 23 ...n 2m};
[0046] The process continues until no more word combinations appearing more than twice are found, i.e., the set N of word combinations is found. x Up to this point, x represents the maximum number of characters in the text combination;
[0047] N1, N2, ... N x Merge them to form a text set N{n 11 n 12 ...n ij , ... n x1 , ...}.
[0048] It should be noted that n ij =[f ij ,k],f ij It can be a single character or a combination of characters, where k is the number of times it appears.
[0049] For example, the text to be compressed is "I am Chinese, I am from Jiangsu, I am from Nanjing, I am also from Nanjing, he is from Nanjing." In the above text, "I," "is," "person," "Nanjing," and "Nanjing" are all single characters that appear more than twice; "I am" and "Nanjing" are two-character combinations that appear more than twice; "Nanjing person" is a three-character combination that appears more than twice; and "is a Nanjing person" is a four-character combination that appears more than twice. However, if "I" and "person" are combined during the recombination process, "I person" is actually not a consecutive single character, and therefore cannot be constructed as a character combination. Therefore, we can obtain N1{I (4 times), is (5 times), person (5 times), south (3 times)}, N2{I am (3 times), Nanjing (3 times)}, N3{Nanjing person (3 times)}, N4{is Nanjing person (3 times)}, and then N is {I (4 times), is (5 times), person (5 times), south (3 times), I am (3 times), Nanjing (3 times), Nanjing person (3 times), is Nanjing person (3 times)}.
[0050] Optionally, in step S120, the individual characters and the combinations of characters are encoded, that is, the character set N{n 11 n 12 ...n ij , ... n x1 , ...} are encoded sequentially, and the character codes corresponding to individual characters and combinations of characters are obtained;
[0051] Then, a dictionary is generated and populated based on individual characters, character combinations, character codes, the correspondence between individual characters and character codes, and the correspondence between character combinations and character codes. The dictionary is initialized to empty. The dictionary can consist of two columns: one for characters and one for codes. This invention does not impose specific limitations on the format of the dictionary.
[0052] Optionally, in step S130, the text corresponding to the text to be compressed is replaced with text codes according to the individual characters and character combinations in the dictionary, thereby completing the text compression.
[0053] The text compression method provided by this invention improves the text compression speed and effect by generating a dictionary of single characters and character combinations that appear more than twice and their corresponding codes, and then compressing the text to be compressed according to the dictionary.
[0054] In one embodiment, encoding the individual characters and the character combinations to form corresponding character codes; and generating a dictionary based on the individual characters, the character combinations, and the character codes, includes:
[0055] The individual characters and combinations of characters are arranged in ascending order of character count;
[0056] Individual characters and combinations of characters with the same number of characters are arranged in ascending order of their frequency of occurrence.
[0057] Based on the arrangement order of the individual characters and the character combinations, the individual characters and the character combinations are encoded sequentially to form the corresponding character codes;
[0058] The dictionary is generated based on the individual characters, the combinations of characters, and the character encoding.
[0059] Optionally, individual characters and character combinations are arranged by first sorting them by the number of characters in ascending order, then by the frequency of occurrence among individual characters in ascending order, and finally by the frequency of occurrence among character combinations with the same number of characters in ascending order. This forms the order in which individual characters and character combinations appear in the dictionary. Based on this order, numerical encoding is performed on the individual characters and character combinations, and a dictionary is generated.
[0060] The text compression method provided by the present invention arranges a single file and text combinations according to rules and encodes them in the arranged order, enabling the rapid generation of a dictionary.
[0061] In one embodiment, compressing the text to be compressed according to the dictionary includes:
[0062] According to the dictionary, replacing the corresponding text in the text to be compressed with the text encoding corresponding to the single text and / or the text combination.
[0063] Optionally, matching the text in the text to be compressed with the single text and text combinations in the dictionary, obtaining the corresponding text encoding in the dictionary, and then replacing the corresponding text in the text to be compressed with the text encoding, thus completing the compression.
[0064] The text compression method provided by the present invention significantly improves the Chinese compression speed and the resources required for compression by means of dictionary-based text encoding assignment.
[0065] In one embodiment, the text compression method further includes:
[0066] When the consecutive text appearing in the text to be compressed can be replaced by consecutive text combinations and / or the single text, preferentially using the text combination with a larger text encoding to replace the corresponding text in the text to be compressed.
[0067] Optionally, there are single texts with an appearance frequency greater than 2 in the text to be compressed. The text combinations with an appearance frequency greater than 2 obtained by recombination can be composed of the above single texts. The current text combination has a corresponding text encoding, and the single texts composing the text combination also have corresponding text encodings. Then the current text combination can be represented by the text encoding corresponding to the text combination, or can also be represented by the text encodings corresponding to the single texts composing the text combination. In the above situation, then according to the size of the text encoding value, select the text encoding with a larger value to replace the corresponding text in the text to be compressed.
[0068] For example: Suppose the single texts with an appearance frequency greater than 2 in the text to be compressed are "spring" and "rain", and the text combination with an appearance frequency greater than 2 is "spring rain". The text encoding of "spring" is 1, the text encoding of "rain" is 2, and the text encoding of "spring rain" is 3. At this time, "spring rain" in the text to be compressed can be directly represented by the text encoding 3, or can also be represented by 12. Since 3 is greater than 1 and 3 is greater than 2, preferentially use the text encoding 3 to replace "spring rain" in the text to be compressed.
[0069] The text compression method provided by this invention determines the unique encoding value of a single character and a combination of characters in the text to be compressed by the magnitude of the character encoding value, which can significantly improve the speed of text compression.
[0070] In one embodiment, the text compression method further includes:
[0071] If the total number of individual characters and character combinations exceeds a preset number, the individual characters and character combinations are deleted according to the specified order until the total number of individual characters and character combinations equals the preset number.
[0072] Optionally, the preset number can be 65535. If the total number of individual characters and character combinations in N is greater than 65535, then delete them sequentially according to the number of combined characters in ascending order and the frequency of the combined characters in ascending order, until the total number of individual characters and character combinations in N equals 65535.
[0073] The text compression method provided by this invention can ensure that text compression does not consume excessive resources by fixing the total number of individual characters and combinations of characters.
[0074] In one embodiment, the text encoding is a 16-bit binary text encoding, and is incremented starting from 0000 0000 0000 0001 according to the arrangement order.
[0075] Optionally, the character encoding is performed sequentially according to the order of individual characters and character combinations in the dictionary, starting from 00000000 0000 0001 and incrementing. Therefore, the larger the number of characters, the larger the encoding value, and the more times they appear, the larger the encoding value.
[0076] The text compression method provided by this invention determines the encoding value by ordering the number of characters and the frequency of occurrence, and determines the text encoding of the corresponding characters in the text to be compressed based on the encoding value, which can improve the compression effect.
[0077] In one embodiment, the text to be compressed is as follows:
[0078] As the saying goes, spring rain is as precious as oil. Du Fu once praised it, saying, "It sneaks in with the wind at night, nourishing all things silently." Perhaps it is because spring rain can bring so much imagination to poets that so many poets have written about it. Lu You has a famous line, "I listened to the spring rain all night in my small building, and tomorrow morning in the deep alley, apricot blossoms will be sold." Meng Haoran also wrote the classic, "The sound of wind and rain last night, how many flowers have fallen?"
[0079] First, generate a compressed dictionary;
[0080] The single characters that appear more than 2 times are: Spring (4 times), Rain (5 times), Poem (2 times), Person (2 times), Flower (2 times), 'of' (2 times), Many (3 times); The two-character combinations that appear more than 2 times are: Spring Rain (4 times), Poet (2 times); There are no three-character combinations that appear more than 2 times;
[0081] Therefore, the generated dictionary is:
[0082] {Poem, 1}{Person, 2}{Flower, 3}{of, 4}{Many, 5}{Spring, 6}{Rain, 7}{Poet, 8}{Spring Rain, 9} (The numbers here are 16-bit binary, the same below).
[0083] The compressed text is:
[0084] As the saying goes, 9 is as precious as oil. Du Fu once praised, "It sneaks in with the wind at night, moistening things silently." Perhaps it's because 9 can bring too much imagination to 8, so many people describe 9. Lu You has a famous line, "Listening to 9 all night in the small building, selling apricots in the deep alley tomorrow morning"; There is also the classic line of Meng Haoran, "Hearing the sound of the wind at night, wondering how many 3s have fallen."
[0085] The text compression method provided by the present invention retrieves the number of characters and character combinations that can be compressed in the text to be compressed, then performs circular splicing on the characters and character combinations to obtain all the characters and character combinations that can be compressed, generates a dictionary for compression, which can improve the text compression speed and text compression effect.
[0086] Next, the text compression device provided by the present invention will be described. The text compression device described below can be mutually corresponding and referred to with the text compression method described above.
[0087] Figure 2 It is a schematic structural diagram of the text compression device provided by the present invention, as Figure 2 shown. The device includes:
[0088] Combination module 210, used to combine single characters that appear more than 2 times in the text to be compressed to obtain character combinations; The character combinations are combinations of consecutive single characters that appear more than 2 times in the text to be compressed;
[0089] Encoding module 220, used to encode the single characters and the character combinations to form corresponding character encodings; Generate a dictionary according to the single characters, the character combinations and the character encodings;
[0090] Compression module 230, used to compress the text to be compressed according to the dictionary.
[0091] The text compression device provided by the present invention compresses text by generating a dictionary of single characters and character combinations that appear more than twice and their corresponding codes, and then compressing the text to be compressed according to the dictionary, thereby improving the text compression speed and text compression effect.
[0092] In one embodiment, the encoding module 220 is further configured to:
[0093] The individual characters and combinations of characters are arranged in ascending order of character count;
[0094] Individual characters and combinations of characters with the same number of characters are arranged in ascending order of their frequency of occurrence.
[0095] Based on the arrangement order of the individual characters and the character combinations, the individual characters and the character combinations are encoded sequentially to form the corresponding character codes;
[0096] The dictionary is generated based on the individual characters, the combinations of characters, and the character encoding.
[0097] In one embodiment, the compression module 230 is further configured to:
[0098] According to the dictionary, the corresponding characters in the text to be compressed are replaced with the character codes corresponding to the single characters and / or combinations of characters.
[0099] In one embodiment, the compression module 230 is further configured to:
[0100] If consecutive characters appearing in the text to be compressed can be replaced by consecutive combinations of the characters and / or the single characters, the character combination with the larger character code shall be used to replace the corresponding characters in the text to be compressed.
[0101] In one embodiment, the text compression device further includes:
[0102] If the total number of individual characters and character combinations exceeds a preset number, the individual characters and character combinations are deleted according to the specified order until the total number of individual characters and character combinations equals the preset number.
[0103] In one embodiment, the text encoding is a 16-bit binary text encoding, and is incremented starting from 0000 0000 0000 0001 according to the arrangement order.
[0104] Figure 3 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 3As shown, the electronic device may include: a processor 310, a communication interface 320, a memory 330, and a communication bus 340, wherein the processor 310, the communication interface 320, and the memory 330 communicate with each other via the communication bus 340. The processor 310 can call a computer program stored in the memory 330 to execute the steps of a text compression method, such as including:
[0105] The individual characters that appear more than twice in the text to be compressed are combined to obtain a character combination; the character combination is a combination of consecutive individual characters that appear more than twice in the text to be compressed.
[0106] Encode the individual characters and the combinations of characters to form corresponding character codes;
[0107] A dictionary is generated based on the individual characters, the combinations of characters, and the character encodings;
[0108] The text to be compressed is compressed according to the dictionary.
[0109] Furthermore, the logical instructions in the aforementioned memory 330 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0110] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, wherein when the program instructions are executed by a computer, the computer is able to perform the text compression method provided by the above methods, the method comprising:
[0111] The individual characters that appear more than twice in the text to be compressed are combined to obtain a character combination; the character combination is a combination of consecutive individual characters that appear more than twice in the text to be compressed.
[0112] Encode the individual characters and the combinations of characters to form corresponding character codes;
[0113] A dictionary is generated based on the individual characters, the combinations of characters, and the character encodings;
[0114] The text to be compressed is compressed according to the dictionary.
[0115] On the other hand, embodiments of this application also provide a processor-readable storage medium storing a computer program for causing the processor to execute the methods provided in the above embodiments, such as including:
[0116] The individual characters that appear more than twice in the text to be compressed are combined to obtain a character combination; the character combination is a combination of consecutive individual characters that appear more than twice in the text to be compressed.
[0117] Encode the individual characters and the combinations of characters to form corresponding character codes;
[0118] A dictionary is generated based on the individual characters, the combinations of characters, and the character encodings;
[0119] The text to be compressed is compressed according to the dictionary.
[0120] The processor-readable storage medium can be any available medium or data storage device that the processor can access, including but not limited to magnetic memory (e.g., floppy disk, hard disk, magnetic tape, magneto-optical disk (MO)), optical memory (e.g., CD, DVD, BD, HVD), and semiconductor memory (e.g., ROM, EPROM, EEPROM, non-volatile memory (NAND FLASH), solid-state drive (SSD)).
[0121] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0122] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0123] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method of text compression, characterized by, include: Combine individual characters that appear more than twice in the text to be compressed to obtain text combinations; The text combination is a combination of consecutive single characters that appear more than twice in the text to be compressed; Encode the individual characters and the combinations of characters to form corresponding character codes; A dictionary is generated based on the individual characters, the combinations of characters, and the character encodings; The text to be compressed is compressed according to the dictionary. The process of encoding the individual characters and the combinations of characters to form corresponding character codes includes: The individual characters and combinations of characters are arranged in ascending order of character count; Individual characters and combinations of characters with the same number of characters are arranged in ascending order of their frequency of occurrence. Based on the arrangement order of the individual characters and the character combinations, the individual characters and the character combinations are encoded sequentially to form the corresponding character codes.
2. The text compression method according to claim 1, characterized in that, The step of compressing the text to be compressed according to the dictionary includes: According to the dictionary, the corresponding characters in the text to be compressed are replaced with the character codes corresponding to the single characters and / or combinations of characters.
3. The text compression method according to claim 2, characterized in that, Also includes: If consecutive characters appearing in the text to be compressed can be replaced by consecutive combinations of the characters and / or the single characters, the corresponding characters in the text to be compressed are replaced by the combination of characters with the larger character code.
4. The text compression method according to claim 1, characterized in that, Also includes: If the total number of individual characters and character combinations exceeds a preset number, the individual characters and character combinations are deleted according to the specified order until the total number of individual characters and character combinations equals the preset number.
5. The text compression method according to claim 1, wherein the text encoding is a 16-bit binary text encoding, and is accumulated starting from 0000 0000 0000 0001 according to the arrangement order.
6. A text compression device, characterized in that, include: The combination module is used to combine individual characters that appear more than twice in the text to be compressed to obtain text combinations; The text combination is a combination of consecutive single characters that appear more than twice in the text to be compressed; The encoding module is used to encode the single character and the combination of characters to form the corresponding character code; A dictionary is generated based on the individual characters, the combinations of characters, and the character encodings; A compression module is used to compress the text to be compressed according to the dictionary; The encoding module is also used for: The individual characters and combinations of characters are arranged in ascending order of character count; Individual characters and combinations of characters with the same number of characters are arranged in ascending order of their frequency of occurrence. Based on the arrangement order of the individual characters and the character combinations, the individual characters and the character combinations are encoded sequentially to form the corresponding character codes.
7. An electronic device comprising a processor and a memory storing a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the text compression method according to any one of claims 1 to 5.
8. A processor-readable storage medium, characterized in that, The processor-readable storage medium stores a computer program for causing the processor to perform the steps of the text compression method according to any one of claims 1 to 5.