Sensitive word storage and retrieval method, system, electronic device, storage medium

By using a target mapping table and sharding service in memory, combined with Trie and dictionary data structures, the problem of low efficiency in sensitive word storage and retrieval is solved, achieving efficient and accurate sensitive word retrieval and similar word detection.

CN118916543BActive Publication Date: 2026-06-23广域铭岛数字科技有限公司 +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
广域铭岛数字科技有限公司
Filing Date
2024-07-17
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies have low efficiency in storing and retrieving sensitive words, and cannot effectively detect similar sensitive words.

Method used

Sensitive words are stored in the target sharding service through a target mapping table. Sensitive word retrieval is performed based on the mapping relationship of the first character. Trie and dictionary data structures are used for storage. Combined with synonym conversion and cache matching, efficient sensitive word retrieval is achieved.

Benefits of technology

It improves the efficiency of sensitive word storage and retrieval, reduces memory space usage, lowers the load pressure on the sharding service, and improves the accuracy and speed of massive sensitive word retrieval.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118916543B_ABST
    Figure CN118916543B_ABST
Patent Text Reader

Abstract

The application relates to the computer technical field, in particular to a sensitive word storage and retrieval method and system, electronic equipment and storage medium, the method comprises the following steps: storing the sensitive word to be stored to a target sharding service through a target mapping table, wherein the target mapping table comprises a mapping relationship between the target sharding service and the first character of the sensitive word to be stored, the target sharding service is one of a plurality of sharding services in memory, and the target sharding service is a sharding service polled currently; based on the target mapping table and the first character of the word to be retrieved, the sharding service to be retrieved corresponding to the first character of the word to be retrieved is determined; and based on the word to be retrieved and the data stored in the sharding service to be retrieved, the sensitive word retrieval is carried out; the method can preferably improve the storage efficiency and retrieval efficiency of the sensitive word, and the cost is low.
Need to check novelty before this filing date? Find Prior Art

Claims

1. A method for storing and retrieving sensitive words, characterized in that, include: The sensitive words to be stored are stored in the target sharding service through the target mapping table. The target mapping table includes the mapping relationship between the target sharding service and the first character of the sensitive words to be stored. The target sharding service is one of multiple sharding services in memory. The target sharding service is the sharding service that is currently polled. Based on the target mapping table and the first character of the search term, determine the retrieval segment service corresponding to the first character of the search term; perform sensitive word retrieval based on the search term and the data stored in the retrieval segment service; the steps of performing sensitive word retrieval based on the search term and the data stored in the retrieval segment service include: Obtain the data structure of the data stored in the shard service to be retrieved; If the data structure of the data stored in the sharding service to be retrieved is a Trie data structure, then the term to be retrieved is split into characters to obtain at least two split characters; based on the split characters and the data stored in the sharding service to be retrieved, a similarity judgment is made to obtain the similarity between the term to be retrieved and the sensitive words stored in the sharding service to be retrieved; if the similarity is greater than or equal to a preset similarity threshold, then the term to be retrieved is determined to be a sensitive word, and the term to be retrieved is stored in the cache; If the data structure of the data stored in the sharding service to be retrieved is a dictionary data structure, then the term to be retrieved is matched with the sensitive words under the dictionary data structure; if the match is successful, the term to be retrieved is determined to be a sensitive word, and the term to be retrieved is stored in the cache.

2. The sensitive word storage and retrieval method according to claim 1, characterized in that, The sensitive words to be stored are either the newly added sensitive words themselves or synonyms of the newly added sensitive words; if this is the first time sensitive words are stored, then the target sharding service is the first sharding service among multiple sharding services. The mapping relationship between the target sharding service and the first character of the sensitive word to be stored is established before storing the sensitive word to be stored in the target sharding service. After establishing the mapping relationship between the target sharding service and the first character of the sensitive word to be stored, the following steps are also included: The preset polling shard number is updated to the shard number of the next shard service of the current target shard service. The target shard service for the next sensitive word storage is determined based on the polling shard number.

3. The sensitive word storage and retrieval method according to claim 1, characterized in that, The steps for storing sensitive words to be stored in the target sharding service through the target mapping table include: The target sharding service is determined based on the first character of the sensitive word to be stored and the target mapping table; Obtain the target address of the target shard service; A storage request is sent to the target address. The storage request includes the sensitive word to be stored. The storage request is used to instruct the target sharding service to perform character splitting on the sensitive word to be stored to obtain at least two target characters. When the preset storage requirement is single data structure storage, the target character is stored according to a Trie data structure or a dictionary data structure; when the storage requirement is redundant data structure storage, the target character is stored according to both a Trie data structure and a dictionary data structure.

4. The sensitive word storage and retrieval method according to claim 3, characterized in that, When the preset storage requirement is single data structure storage, the steps of storing the target character according to a Trie data structure or a dictionary data structure include: If the preset storage requirement is single data structure storage, the target character is stored according to the Trie data structure; Retrieve preset query conditions; When the query condition requires a full match search when performing sensitive word retrieval, the number of sensitive words and the number of characters of the sensitive words stored in the current sharding service are obtained. The full match search refers to directly matching the word to be searched with the stored sensitive words. Based on the number of sensitive words and the number of characters of the sensitive words, the first memory occupation size of all sensitive words stored in the current sharding service under the Trie data structure and the second memory occupation size under the dictionary data structure are obtained. If the second memory footprint is smaller than the first memory footprint, then the data under the Trie data structure is converted into a dictionary data structure.

5. The sensitive word storage and retrieval method according to claim 1, characterized in that, The steps for obtaining the search term include: Get the text to be searched; The text to be retrieved is filtered using a preset filter to obtain the target text; The target text is segmented using a preset word segmenter to obtain at least one word; The word segments are converted into synonyms to obtain the target synonyms; The target synonym is matched with the historical sensitive words pre-stored in the cache. The historical sensitive words are the sensitive words that were previously successfully retrieved. A successful retrieval means that a sensitive word that is the same as or similar to the previous target synonym is retrieved. If a match is successful, the target synonym is identified as a sensitive word; if a match fails, the target synonym is identified as the word to be searched.

6. The sensitive word storage and retrieval method according to claim 1 or 5, characterized in that, Also includes: If the target mapping table lacks the first character of the current search term, then the current search term is determined to be a normal word, and the search is stopped, or the first character of the next search term is matched with the first character in the target mapping table.

7. A sensitive word storage and retrieval system, characterized in that, include: A sensitive word storage module is used to store sensitive words to be stored in a target sharding service through a target mapping table. The target mapping table includes a mapping relationship between the target sharding service and the first character of the sensitive word to be stored. The target sharding service is one of multiple sharding services in memory, and the target sharding service is the currently polled sharding service. The sensitive word retrieval module is used to determine the segment service to be retrieved corresponding to the first character of the target word based on the target mapping table and the first character of the word to be retrieved; and to perform sensitive word retrieval based on the word to be retrieved and the data stored in the segment service to be retrieved. The sensitive word retrieval module is specifically used to obtain the data structure of the data stored in the sharding service to be retrieved; if the data structure of the data stored in the sharding service to be retrieved is a Trie data structure, then the word to be retrieved is split into characters to obtain at least two split characters; based on the split characters and the data stored in the sharding service to be retrieved, a similarity judgment is performed to obtain the similarity between the word to be retrieved and the sensitive words stored in the sharding service to be retrieved; if the similarity is greater than or equal to a preset similarity threshold, then the word to be retrieved is determined to be a sensitive word, and the word to be retrieved is stored in the cache; if the data structure of the data stored in the sharding service to be retrieved is a dictionary data structure, then the word to be retrieved is matched with the sensitive words under the dictionary data structure; if the match is successful, the word to be retrieved is determined to be a sensitive word, and the word to be retrieved is stored in the cache.

8. An electronic device, characterized in that, It includes a processor, a memory, and a communication bus; the communication bus is used to connect the processor and the memory; the processor is used to execute a computer program stored in the memory to implement the sensitive word storage and retrieval method as described in any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that, It stores a computer program that enables the computer to perform the sensitive word storage and retrieval method as described in any one of claims 1 to 6.