Method and device for judging coding format of JAVA file and byte stream
A coding format and judging device technology, which is applied in the field of data processing, can solve problems such as error-prone and heavy workload, and achieve the effect of small workload, accurate judgment, and simple program
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0051] According to the Unicode encoding rule and the first four bytes of the file, the encoding format of the file is judged, and the specific steps include:
[0052] 1) Get the encoding format of the current operating system operating environment:
[0053] String dc = Charset.defaultCharset().name();
[0054] 2) Convert the input stream to a Unicode input stream:
[0055] UnicodeInputStream uin = new UnicodeInputStream(in,dc);
[0056] 3) Read the first byte of the file stream:
[0057] byte[]head = new byte[4];
[0058] in. read(head);
[0059] 4) Define the encoding format as GBK:
[0060] String code="GBK";
[0061] 5) According to the Unicode encoding rules, if the first byte is -1 and the second byte is -2, the encoding format is UTF-16:
[0062] if(head[0]==-1&&head[1]==-2)
[0063] code="UTF-16";
[0064] 6) According to the Unicode encoding rules, if the first byte is -2 and the second byte is -1, the encoding format is Unicode:
[0065] if(head[0]==-2&&hea...
Embodiment 2
[0075] According to the Unicode encoding rules and the first four bytes of the byte stream, the encoding format of the byte stream is judged, and its specific implementation method is:
[0076] 1) Determine whether the byte stream is encoded in GB2312, obtain the first two bytes of the byte[] byte stream, and the first byte head and the second byte tail with 0xff to obtain iHead and iTail, if The first new byte iHead>=0xa1 and iHead=0xa1 and iTail<=0xfe, according to the Unicode encoding rules, this byte array is GB2312.
[0077] 2) Determine whether the byte stream is GBK encoded, get the first two bytes of the byte[] byte stream, and the first byte head and the second byte tail with 0xff to get iHead and iTail, if The first new byte iHead>=0x81 and iHead=0x40 and iTail=0x80 and iTail<=0xfe), encoded according to Unicode As a rule, this byte array is GBK.
[0078] 3) Determine whether the byte stream is BIG5 encoded, get the first two bytes of the byte[] byte stream, and th...
Embodiment 3
[0086] The judging method of the coding format of the JAVA file and the byte stream of the present invention also includes judging the coding format of the character string by judging the coding format of the byte stream, specifically including: setting the coding format of the original character string of the unknown coding format as a certain An encoding format; converting the original character string into a byte stream whose encoding format is the set encoding format, and then converting the encoding format into a new character string with the encoding format being the byte stream of the set encoding format, and converting the described The original character string is compared with the new character string, and if the two character strings are the same, the encoding format of the original character string is the set encoding format.
[0087] During specific implementation, the encoding rules for judging character strings are performed on the basis of judging the encoding r...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com