Coded format detection method and coded format detection device for text files

A coding format and text file technology, which is applied in the field of text file coding format detection, can solve problems such as garbled characters in text files, and achieve the effect of avoiding garbled characters and fast code conversion

Active Publication Date: 2012-07-11
HANVON CORP
View PDF4 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, such a method is not always feasible. For some encoding formats, such as ASCII, GB2312, and UTF7 encoding, there is no encoding byte order identifier. For such encoding formats, a default encoding format is usually used for encoding conversion. , but if the default encoding format is inconsistent with the encoding format of the text file, the text file will be displayed as garbled characters

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Coded format detection method and coded format detection device for text files
  • Coded format detection method and coded format detection device for text files
  • Coded format detection method and coded format detection device for text files

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] Because the present invention discloses a text file encoding format detection method and device, especially including a encoding conversion method. Some coded byte sequence identifiers, bytes, and character sets used in the embodiments of the present invention are achieved by using existing technologies, so in the following description, a complete description will not be made, and the description will be made first.

[0018] The invention discloses a method for detecting the encoding format of a text file, such as figure 1 As shown, the specific steps include the following:

[0019] Step 101: First, read a text file of a predetermined length from a text file (such as Word, txt, etc.), then divide the text file into several text fields, and divide the text file according to the size of the text file in order to obtain the fastest speed. Segment analysis, and because the file itself has mixed encoding, text segmentation can also solve the problem of mixed encoding to a c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a coded format detection method and a coded format detection device for text files, which belong to the field of file processing. The method includes the steps: dividing one text file into a plurality of text segments; if byte codes of first four bytes in a current text segment is larger than 0x00 and smaller than 0x7F, determining a coded format of the current text segment as ASCII (American standard code for information interchange); otherwise, detecting in corresponding coded format groups according to the coded byte size adopted by the byte codes, and transforming the current text segment into the correspondingly matched coded format according to a detecting result; and reading bytes in a next text segment for detection until all the text files are transformed. By the aid of the method and the device, text codes unmatched with coded byte order identifiers are judged by grouping, various coded formats are transformed, and messy codes cannot be generated when the coded formats in display are inconsistent with the byte codes of the text files, so that code transformation can be more quickly performed for the text files.

Description

technical field [0001] The invention belongs to the field of file processing, in particular to a method for detecting the encoding format of a text file. Background technique [0002] The character encoding of a text file specifies the storage method of the characters. To obtain the text content and display it, it is necessary to know the storage method of these text files after being read into the memory, that is, the encoding format of the file. [0003] Currently commonly used text file encoding formats are: ASCII, GB2312, GBK, GB18030, BIG5, ISO-8859-1, UCS-2, UTF-16, UTF-8, etc. These encoding formats, encoding methods, and lengths are all different. When processing text files, it is necessary to convert these different codes to prevent garbled characters when displaying text files. [0004] Before encoding and converting the text file, it is necessary to detect the encoding format of the text file. The commonly used method of encoding detection is to judge according t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22
Inventor 宋久元展永定
Owner HANVON CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products