Chinese sensitive text recognition method and device, storage medium and equipment
A text recognition and storage medium technology, applied in text database indexing, text database query, unstructured text data retrieval, etc., can solve the problem of not being able to cover variants such as homophones, similar characters, split characters, and inaccurate recognition of sensitive characters and other issues to achieve the effect of improving the recall rate and avoiding misjudgment
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0037] see figure 1 , Embodiment 1 of the present invention provides a Chinese sensitive text recognition method, comprising the following steps:
[0038] S1. Obtain a text object to be recognized, perform preprocessing on the text object, and obtain a text pinyin list corresponding to the text object after preprocessing;
[0039] S2. Convert the sensitive Chinese characters in the sensitive lexicon into sensitive pinyin, and generate a pinyin Trie tree corresponding to the sensitive pinyin;
[0040] S3. Search on the Pinyin Trie tree through the text pinyin list, mark the text pinyin searched in the text pinyin list as sensitive words, and perform context backtracking through the marked sensitive words to obtain sensitive words in the text object Content, blanking sensitive words in the sensitive content.
[0041] Specifically, a Trie tree, also known as a word lookup tree, is a tree structure and a variant of a hash tree. Typical applications are for counting, sorting and...
Embodiment 2
[0063] see image 3 , Embodiment 2 of the present invention provides a Chinese-sensitive text recognition device, using the Chinese-sensitive text recognition method of Embodiment 1 or any possible implementation thereof, including:
[0064] A text recognition preprocessing unit 1, configured to obtain a text object to be recognized, perform preprocessing on the text object, and obtain a text pinyin list corresponding to the text object after preprocessing;
[0065] Sensitive words pinyin Trie tree generating unit 2, for converting the sensitive Chinese characters in the sensitive lexicon into sensitive pinyin, generating the corresponding pinyin Trie tree of the sensitive pinyin;
[0066] The text-sensitive content identification processing unit 3 is used to search on the Pinyin Trie tree through the text pinyin list, mark the text pinyin searched in the text pinyin list as sensitive words, and perform context backtracking through the marked sensitive words Sensitive content...
Embodiment 3
[0083] Embodiment 3 of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores the program code of the Chinese-sensitive text recognition method, and the program code includes a method for executing Embodiment 1 or any possible implementation thereof Instructions for the Chinese sensitive text recognition method.
[0084] The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media. The available medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, DVD), or a semiconductor medium (for example, a solid state disk (SolidState Disk, SSD)) and the like.
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


