Multi-level natural language anti-junk text method and system
A natural language, multi-level technology, applied in the field of information processing, can solve problems such as poor recognition of junk text, and achieve the effect of avoiding adverse effects, high robustness, and efficient recognition
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0064] Example 1
[0065] see figure 1 , a multi-level natural language anti-spam text method, including the following steps:
[0066] S101, receiving the text to be recognized;
[0067] S102, based on the original sensitive word database, match the original sensitive words on the text to be recognized, identify the original sensitive words in the to-be-recognized text, and output a sensitive word recognition result; wherein, the original sensitive word database includes original sensitive words sensitive words;
[0068] S103, based on the sensitive word variant library, perform matching of sensitive word variants on the to-be-recognized text, and perform semantic analysis on the matched suspected words to verify whether the suspected words belong to sensitive words, and output the identification of sensitive word variants The result; wherein, the sensitive word variant library is established according to the original sensitive word library, and the sensitive word variant...
Example Embodiment
[0083] Example 2
[0084] Embodiment 2 is an improvement on the basis of Embodiment 1, mainly for how to establish the sensitive word variant library, please refer to figure 2 , the establishment of the sensitive word variant library includes the following steps:
[0085] S201, obtaining keywords that form the original sensitive words from the original sensitive word database;
[0086] S202, compare existing Chinese characters and the keyword in terms of phonetic, and obtain the phonetic similarity of the existing Chinese character and the keyword;
[0087] S203, compare existing Chinese characters and the keywords on the glyph, and obtain the glyph similarity between the existing Chinese characters and the keywords;
[0088] S204, filter out the similar words of the keyword according to the phonetic-shape similarity and the glyph similarity;
[0089] S205, according to the mapping relationship corresponding to the split word, obtain the split word of the keyword;
[00...
Example Embodiment
[0113] Example 3
[0114] Embodiment 3 is an improvement on the basis of Embodiment 1 or 2. It mainly focuses on how to classify the text to be recognized, and obtain the pre-judgment probability that the text to be recognized is garbage text. Please refer to Figure 4 , including the following steps:
[0115] S301, performing word segmentation and quantization on the text to be identified, to form vectorized information to be identified;
[0116] S302, using a deep neural network classification model combined with a convolutional neural network and a long short-term memory network and trained on a corpus data set to process the vectorized information to be identified, and obtain a pre-judgment probability that the text to be identified is junk text .
[0117] Through the above steps, the continuous text is segmented and vectorized, which is easy for subsequent analysis by means of mathematical models; the deep neural network classification model that combines convolutiona...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2023 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap