Multi-level natural language anti-junk text method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A natural language, multi-level technology, applied in the field of information processing, can solve problems such as poor recognition of junk text, and achieve the effect of avoiding adverse effects, high robustness, and efficient recognition

Pending Publication Date: 2019-07-05

SUN YAT SEN UNIV +1

View PDF6 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In practical applications, spam text words will be replaced in various deformation ways, resulting in the poor recognition effect of existing schemes on spam text, which can no longer meet the current needs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0065] see figure 1 , a multi-level natural language anti-spam text method, comprising the following steps:

[0066] S101, receiving text to be recognized;

[0067] S102. Based on the original sensitive thesaurus, match the original sensitive words on the text to be recognized, identify the original sensitive words in the text to be recognized, and output the sensitive word recognition result; wherein, the original sensitive thesaurus includes the original Sensitive words;

[0068] S103, based on the database of sensitive word variants, perform matching of sensitive word variants on the text to be identified, perform semantic analysis on the matched suspected words, verify whether the suspected words belong to sensitive words, and output sensitive word variant identification Result; Wherein, described sensitive word deformation body database is established according to described original sensitive word database, and described sensitive word deformation body database comprise...

Embodiment 2

[0084] Embodiment 2 is an improvement on the basis of Embodiment 1, mainly for how to set up the database of sensitive word variants, please refer to figure 2 , the establishment of the sensitive word variant database, comprising the following steps:

[0085] S201. Obtain keywords that form the original sensitive words from the original sensitive thesaurus;

[0086] S202. Comparing the pronunciation of the existing Chinese characters with the keyword, and obtaining the similarity in pronunciation between the existing Chinese characters and the keyword;

[0087] S203. Comparing the existing Chinese characters with the keyword in terms of font, and obtaining the font similarity between the existing Chinese character and the keyword;

[0088] S204, filter out similar words of the keyword according to the phonetic-shape similarity and font-shape similarity;

[0089] S205. According to the mapping relationship corresponding to the split characters, acquire the split characters o...

Embodiment 3

[0114] Embodiment 3 is an improvement on the basis of Embodiment 1 or 2. It is mainly aimed at how to classify the text to be recognized to obtain the predicted probability that the text to be recognized is junk text. Please refer to Figure 4 , including the following steps:

[0115] S301. Segment and vectorize the text to be recognized to form vectorized information to be recognized;

[0116] S302, using a deep neural network classification model combined with a convolutional neural network and a long-term short-term memory network and trained by a corpus data set to process the vectorized information to be identified, and obtain a predicted probability that the text to be identified is junk text .

[0117] Through the above steps, the continuous text is segmented and vectorized, which is easy to analyze with the method of mathematical model; the deep neural network classification model combined with convolutional neural network and long-term short-term memory network and t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a multi-level natural language anti-junk text method and system. The method comprises the steps of obtaining a sensitive word recognition result and a sensitive word deformation recognition result of a to-be-recognized text; and performing text classification on the to-be-identified text to obtain a pre-judgment probability that the to-be-identified text is a junk text, and performing comprehensive judgment based on the sensitive word identification result, the sensitive word deformation identification result and the pre-judgment probability to obtain a final probability that the to-be-identified text is the junk text. According to the method, the junk text can be efficiently recognized, the adverse effect of the junk text on the Internet health communication environment can be avoided, the robustness is higher, and the method can be widely applied to the social contact, comments and other Internet products.

Description

technical field [0001] The invention relates to the technical field of information processing, in particular to a multi-level natural language anti-spam text method and system. Background technique [0002] With the rapid development of the Internet, users use websites and applications more and more frequently, and the text content generated on the Internet is also emerging at an explosive speed, such as live barrage, post bars, comments, social platforms and other Internet content types. Products that drive a lot of text as the number of active users grows. However, there are still many junk texts in these texts, including advertisements, pornography, insults, violence, drugs, or other bad information. These spam texts contain sensitive words in various forms and are characterized by fast update and high degree of freedom. They are widely spread on the Internet, seriously affecting the healthy development of the Internet. In order to create a harmonious and pure Internet ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/27G06F16/33G06F16/35G06F16/903

CPCG06F40/284G06F40/30Y02D10/00

Inventor 叶志豪刘冶桂进军李宏浩印鉴

Owner SUN YAT SEN UNIV

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-level natural language anti-junk text method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology