File encoding identification method and computer-readable storage medium
A recognition method and coding technology, applied in the field of coding recognition, can solve problems such as garbled characters, and achieve the effect of avoiding garbled characters
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0065] Please refer to Figure 2-3 , Embodiment 1 of the present invention is: a kind of file coding recognition method, can carry out correct coding recognition to the file that does not have character mark, and this method mainly comprises two parts, and one is to collect sample file and generate positive word library and Reverse thesaurus, and the second is to identify the source code of the document to be identified according to the forward thesaurus and the reverse thesaurus.
[0066] Among them, such as figure 2 As shown, the first part includes the following steps:
[0067] S101: Collect a preset number of sample files, where the sample files include non-garbled texts in various languages, such as articles in Chinese and Japanese. Since the sample files are used to generate the forward word library and the reverse word library, the more sample files there are, the better the recognition effect will be.
[0068] S102: Convert the file codes of the sample files into c...
Embodiment 2
[0109] This embodiment is a computer-readable storage medium corresponding to the above-mentioned embodiments, on which a computer program is stored, and when the program is executed by a processor, the following steps are implemented:
[0110] Collecting sample files, the sample files include non-garbled texts in various languages;
[0111] Converting the file encodings of the sample files to each encoding in the preset encoding set, and generating a forward word library corresponding to each encoding according to the converted sample files;
[0112] The sample file is decoded by other codes different from its file codes in the code set to obtain the garbled file, and the code conversion direction of the garbled file is recorded, and the code conversion direction includes file code and decoding code;
[0113] According to the garbled file, generate the reverse word thesaurus corresponding to its coding conversion direction;
[0114] Obtain the file to be identified;
[0115...
PUM

Abstract
Description
Claims
Application Information

- R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com