UTF-8 (8-bit Unicode transformation format) and ANSI (American national standards institute) code identification method and device

A UTF-8, encoding recognition technology, applied in the field of encoding, can solve problems such as character parsing errors, text display garbled characters, and similar encoding methods

Active Publication Date: 2014-08-06
GUANGZHOU SHIYUAN ELECTRONICS CO LTD
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the encoding methods of ANSI and UTF-8 are different, the encoding methods are similar or even have overlapping encoding areas. There is a certain probability that a text file in UTF-8 encoding format will be mistakenly parsed and displayed as an ANSI encoding format file.
If the text file or text data stream is parsed using the wrong encoding method, it

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • UTF-8 (8-bit Unicode transformation format) and ANSI (American national standards institute) code identification method and device
  • UTF-8 (8-bit Unicode transformation format) and ANSI (American national standards institute) code identification method and device
  • UTF-8 (8-bit Unicode transformation format) and ANSI (American national standards institute) code identification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] The embodiment of the present invention provides a UTF-8 and ANSI encoding identification method and device, which are used to identify whether a file is encoded in UTF-8 or ANSI, so as to avoid garbled characters in the file display due to the wrong encoding mode used to parse the file.

[0061] In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the following The described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0062] see figure 1 One embodiment of the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a UTF-8 (8-bit Unicode transformation format) and ANSI (American national standards institute) code identification method used for identifying and distinguishing whether a file is in a UTF-8 coding mode or an ANSI coding mode. The condition that messy codes are displayed in the file in the process of parsing the file by using a wrong coding mode is avoided. The method in the embodiment of the invention comprises the following steps: S1, acquiring a data stream of the file; S2, storing the data stream in an array in a byte form; S3, judging whether preorder bytes exist in the array, if so, deleting the preorder bytes and executing a step S4, otherwise executing the step S4; S4, judging whether a first byte exists in the array, if so, deleting the first byte and executing a step S5, otherwise executing the step S5; S5, judging whether a second byte or a third byte exists in the array, if so, the coding mode of the file is ANSI, otherwise the coding mode of the file is UTF-8. The embodiment of the invention also provides a UTF-8 and ANSI code identification device.

Description

technical field [0001] The invention relates to the field of coding, in particular to a UTF-8 and ANSI coding recognition method and device. Background technique [0002] ASCII is a computer coding system based on the Latin alphabet. It is mainly used to display modern English and other Western European languages. It is the most common single-byte encoding system used today. However, since the ASCII code has only 128 characters, it cannot represent all language characters in the world. Different countries and regions have formulated different standards, resulting in their own encoding standards such as GB2312, BIG5, and JIS. These use 2 bytes to represent various Chinese character extension encoding methods of a character, called ANSI (American National Standards Institute, the standard code of the American National Standards Institute). [0003] UTF-8 (8-bit Unicode Transformation Format, Universal Code) is a variable-length character encoding for Unicode, and it is als...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F40/126
Inventor 姚方谋
Owner GUANGZHOU SHIYUAN ELECTRONICS CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products