Supercharge Your Innovation With Domain-Expert AI Agents!

File encoding identification method and computer-readable storage medium

A recognition method and coding technology, applied in the field of coding recognition, can solve problems such as garbled characters, and achieve the effect of avoiding garbled characters

Active Publication Date: 2021-03-23
FUJIAN TQ DIGITAL
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the prior art, the judgment of file encoding can only be judged based on the first 3 bytes of the file whether it is UTF-8 (8-bit Unicode Transformation Format, a variable-length character encoding for Unicode, also known as Universal Code) file encoding, while other file encodings do not have any obvious characteristics to judge. Users can only choose to view the encoding of the file. If the encoding selected by the user is incorrect, garbled characters will appear.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • File encoding identification method and computer-readable storage medium
  • File encoding identification method and computer-readable storage medium
  • File encoding identification method and computer-readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0065] Please refer to Figure 2-3 , Embodiment 1 of the present invention is: a kind of file coding recognition method, can carry out correct coding recognition to the file that does not have character mark, and this method mainly comprises two parts, and one is to collect sample file and generate positive word library and Reverse thesaurus, and the second is to identify the source code of the document to be identified according to the forward thesaurus and the reverse thesaurus.

[0066] Among them, such as figure 2 As shown, the first part includes the following steps:

[0067] S101: Collect a preset number of sample files, where the sample files include non-garbled texts in various languages, such as articles in Chinese and Japanese. Since the sample files are used to generate the forward word library and the reverse word library, the more sample files there are, the better the recognition effect will be.

[0068] S102: Convert the file codes of the sample files into c...

Embodiment 2

[0109] This embodiment is a computer-readable storage medium corresponding to the above-mentioned embodiments, on which a computer program is stored, and when the program is executed by a processor, the following steps are implemented:

[0110] Collecting sample files, the sample files include non-garbled texts in various languages;

[0111] Converting the file encodings of the sample files to each encoding in the preset encoding set, and generating a forward word library corresponding to each encoding according to the converted sample files;

[0112] The sample file is decoded by other codes different from its file codes in the code set to obtain the garbled file, and the code conversion direction of the garbled file is recorded, and the code conversion direction includes file code and decoding code;

[0113] According to the garbled file, generate the reverse word thesaurus corresponding to its coding conversion direction;

[0114] Obtain the file to be identified;

[0115...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for identifying file codes and a computer-readable storage medium. The method includes: collecting sample files; respectively converting the file codes of the sample files into preset codes, and generating a forward word library corresponding to each code ; Decode the sample file through other encodings different from its file encoding, obtain a garbled file and record the encoding conversion direction; generate a reverse word library corresponding to the encoding conversion direction according to the garbled file; obtain the file to be recognized; The encoding decodes the file to be recognized; obtains the words and words in the decoded file to be recognized, and matches the words and words in the corresponding forward word database and reverse word database respectively to obtain the number of positive matches and The number of reverse matches; if the number of forward matches is greater than the number of reverse matches, a code is used as the file code of the file to be identified. The present invention can correctly identify the file code.

Description

technical field [0001] The invention relates to the technical field of code identification, in particular to a file code identification method and a computer-readable storage medium. Background technique [0002] Currently there are multiple encoding methods, so if you want to open a text file, you must know its encoding method, otherwise, garbled characters will appear if you interpret it with the wrong encoding method. [0003] In the prior art, the judgment of file encoding can only be judged based on the first 3 bytes of the file whether it is UTF-8 (8-bit Unicode Transformation Format, a variable-length character encoding for Unicode, also known as Universal Code) file encoding, while other file encodings do not have any obvious characteristics to judge. Users can only choose to view the encoding of the file. If the encoding selected by the user is incorrect, garbled characters will appear. Contents of the invention [0004] The technical problem to be solved by the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/16H03M7/30
CPCG06F16/16H03M7/705
Inventor 刘德建陈广喜陈丛亮郭玉湖
Owner FUJIAN TQ DIGITAL
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More