A method and system for detecting garbled characters in a text document
A text document, garbled code detection technology, applied in instruments, computing, electrical digital data processing and other directions, can solve the problem of not being able to determine the real cause, to achieve the effect of improving the speed and reducing the scope
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0052] The flowchart of the text document garbled code detection method described in this embodiment is as follows figure 1 shown, including the following steps:
[0053] A step of establishing a first coding range library, the first coding range library includes the coding ranges of all regular characters in the coding format of the detected text document characters.
[0054] In the sampling step, codes corresponding to M characters are selected from the detected text document, where M is an integer greater than or equal to 1.
[0055] In the first comparison step, the codes corresponding to the M characters selected in the sampling step are compared with the codes in the first code range library respectively, and the same characters will be obtained in the first code range library. Characters corresponding to the codes of the result are determined as non-garbled characters; characters corresponding to codes that cannot obtain the same result in the first code range library ...
Embodiment 2
[0062] On the basis of Embodiment 1, the text document garbled detection method described in this embodiment, such as figure 2 As shown, the following steps are also included: the step of establishing a second coding range library, the second coding range library includes the coding ranges of all characters in all existing coding formats. In the second comparison step, the codes corresponding to the characters that are determined to be garbled characters by the first comparison step are compared with the codes in the second code range library respectively, if determined by the first comparison step If the code corresponding to the character determined to be garbled codes obtains the same result in the second code range library, the character corresponding to the code is restored to be a non-garbled code. If a code with the same result cannot be obtained in the second code range library, it is determined that the character corresponding to the code is a garbled character.
[...
Embodiment 3
[0069] The text document garbled character detection system described in this embodiment includes: a sampling module 1, configured to select codes corresponding to M characters from the detected text document, where M is an integer greater than or equal to 1. The first coding range library 2 is used to store the coding ranges of all normal characters in the coding format of the detected text document characters. The first comparison module 3 is used to compare the codes corresponding to the M characters selected by the sampling module 1 with the codes in the first code range library 2 respectively, and compare the codes in the first code range Characters corresponding to codes that obtain the same result in library 2 are determined as non-garbled characters; characters corresponding to codes that cannot obtain the same result in the first code range library 2 are determined as garbled characters.
[0070] In this embodiment, through the joint action of the sampling module 1, t...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com