Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for judging coding format of JAVA file and byte stream

A coding format and judging device technology, which is applied in the field of data processing, can solve problems such as error-prone and heavy workload, and achieve the effect of small workload, accurate judgment, and simple program

Inactive Publication Date: 2017-05-31
BANK OF CHINA
View PDF2 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The embodiment of the present invention adopts Unicode encoding rules to judge the encoding format of files and byte streams, so as to solve the problem that the existing judgment method has a large workload and is prone to errors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for judging coding format of JAVA file and byte stream
  • Method and device for judging coding format of JAVA file and byte stream
  • Method and device for judging coding format of JAVA file and byte stream

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0051] According to the Unicode encoding rule and the first four bytes of the file, the encoding format of the file is judged, and the specific steps include:

[0052] 1) Get the encoding format of the current operating system operating environment:

[0053] String dc = Charset.defaultCharset().name();

[0054] 2) Convert the input stream to a Unicode input stream:

[0055] UnicodeInputStream uin = new UnicodeInputStream(in,dc);

[0056] 3) Read the first byte of the file stream:

[0057] byte[]head = new byte[4];

[0058] in. read(head);

[0059] 4) Define the encoding format as GBK:

[0060] String code="GBK";

[0061] 5) According to the Unicode encoding rules, if the first byte is -1 and the second byte is -2, the encoding format is UTF-16:

[0062] if(head[0]==-1&&head[1]==-2)

[0063] code="UTF-16";

[0064] 6) According to the Unicode encoding rules, if the first byte is -2 and the second byte is -1, the encoding format is Unicode:

[0065] if(head[0]==-2&&hea...

Embodiment 2

[0075] According to the Unicode encoding rules and the first four bytes of the byte stream, the encoding format of the byte stream is judged, and its specific implementation method is:

[0076] 1) Determine whether the byte stream is encoded in GB2312, obtain the first two bytes of the byte[] byte stream, and the first byte head and the second byte tail with 0xff to obtain iHead and iTail, if The first new byte iHead>=0xa1 and iHead=0xa1 and iTail<=0xfe, according to the Unicode encoding rules, this byte array is GB2312.

[0077] 2) Determine whether the byte stream is GBK encoded, get the first two bytes of the byte[] byte stream, and the first byte head and the second byte tail with 0xff to get iHead and iTail, if The first new byte iHead>=0x81 and iHead=0x40 and iTail=0x80 and iTail<=0xfe), encoded according to Unicode As a rule, this byte array is GBK.

[0078] 3) Determine whether the byte stream is BIG5 encoded, get the first two bytes of the byte[] byte stream, and th...

Embodiment 3

[0086] The judging method of the coding format of the JAVA file and the byte stream of the present invention also includes judging the coding format of the character string by judging the coding format of the byte stream, specifically including: setting the coding format of the original character string of the unknown coding format as a certain An encoding format; converting the original character string into a byte stream whose encoding format is the set encoding format, and then converting the encoding format into a new character string with the encoding format being the byte stream of the set encoding format, and converting the described The original character string is compared with the new character string, and if the two character strings are the same, the encoding format of the original character string is the set encoding format.

[0087] During specific implementation, the encoding rules for judging character strings are performed on the basis of judging the encoding r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and device for judging the coding format of a JAVA file and a byte stream. The method includes the steps that front four bytes of the file or the byte stream are read; the coding format of the file or the byte stream is judged according to a Unicode coding rule and the front four bytes of the file or the byte stream. According to the method and device for judging the coding format of the JAVA file and the byte stream, the coding format of the file and the byte stream is judged according to the Unicode coding rule, and the advantages of being low in workload, simple in program and accurate in judgment are achieved.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a method for judging an encoding format, specifically a method and a device for judging an encoding format of a JAVA file and a byte stream. Background technique [0002] This section is intended to provide a background or context for implementations of the invention that are recited in the claims. The descriptions herein are not admitted to be prior art by inclusion in this section. [0003] The strings in memory are not limited to the strings directly loaded from the class code, some strings are read from text files, some are read from the database, and possibly from bytes Arrays are constructed, but they are basically not Unicode encoded, the reason is very simple, for storage optimization. [0004] Therefore, it is necessary to deal with various encoding problems. Before processing, the encoding of the "source" must be clarified, and then read into the memory correctly with t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/45
CPCG06F8/44
Inventor 王同庆
Owner BANK OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products