Unlock instant, AI-driven research and patent intelligence for your innovation.
Method for text classification through sensitive word detection and illegal content recognition
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of illegal content and text classification, applied in the field of text analysis, to achieve the effects of high efficiency, improved accuracy and strong scalability
Pending Publication Date: 2020-02-28
EISOO SOFTWARE
View PDF4 Cites 7 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
In the field of text classification, there are traditional and its learning algorithms, such as SVM, KNN, random forest, etc., as well as the popular neural network classification method in recent years, which uses text feature words to use algorithms to build models and classify texts. Only a probability value can be given to the text, and it can be judged as a certain type of article without a certain word
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0051] When a string is passed in, such as "I am a member of the Communist Youth League", the Communist Youth League can be matched, and the matching path is as follows Figure 4 As shown, the specific matching process is as follows: the child nodes of the root node only have the words 'Common', 'Tuan' and 'Qing', traverse the incoming string "I am a member of the Communist Youth League", the first four characters are 'I'' "One" and "name" do not match, until "Together" is matched, the next node of "Total" has "production" and "Qing", which can be matched with "Qing", the next node of "Qing" is 'Tuan', when 'Tuan' is matched, it is already the maximum length of this path. There is the word 'Communist Youth League' in the dictionary, which can match 'Communist Youth League', and then jump to the fail pointer position of 'Tuan', but "I Is a member of the Communist Youth League", the next character of 'tuan' is '', so the 'tuan' fail pointer points to the root node, and finally m...
[0057] Training text: "A certain website is an illegal website, which contains a lot of political reactionary content, and it is a website that is forbidden to visit in our country."
[0058] (2) Training preprocessing
[0059] Text label: [0,1,0,0] ([1,0,0,0] means normal text, [0,1,0,0] means political reactionary text, [0,0,1,0] means pornographic text, [0,0,0,1] for other text)
[0060] Text vector: [1,1,1,1,0] (the first number 1 means that 'illegal' in the dictionary appears once in the text, and the second number 1 means that 'political' appears in the dictionary once in the text , and so on)
[0061] (3) Model training
[0062] Input the labeled text vector into the cyclic neural network for learning, and output a trained model.
[0064] After the model training is complete, you can pass Figure 5The middle step detects illeg...
Embodiment 3
[0069] 1. Test of sensitive word detection:
[0070] 1. Test text
[0071] number of test texts Coverage other instructions 3944 articles Current affairs, sports, entertainment and other news Crawling Internet News
[0072] 2. Test sensitive word dictionary: ["XX": "political sensitive",
[0073] "XXX": "Political Sensitive",
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The invention relates to a method for classifying texts through sensitive word detection and illegal content recognition, which comprises the following steps of: 1, acquiring a text to be detected, and simultaneously executing the step 2 and the step 3; 2, performing sensitive word detection through an AC automaton, and then executing the step 4; 3, performing illegal content identification through the recurrent neural network model, and then executing the step 6; 4, judging whether the text contains sensitive vocabularies or not, if yes, executing the step 5, and otherwise, returning to the step 3; 5, the text contains sensitive vocabularies, and the text category is judged according to the sensitive vocabularies; 6, judging whether the text contains illegal content, if yes, executing thestep 7, and otherwise, executing the step 8; 7, the text contains illegal content, and the text category is judged according to the illegal content; 8, the text does not contain illegal content; and9, ending the current round of processing logic. Compared with the prior art, the method has the advantages of high accuracy, high efficiency, strong expansibility and the like.
Description
technical field [0001] The invention relates to the technical field of text analysis, in particular to a method for text classification through sensitive word detection and illegal content identification. Background technique [0002] In the field of text analysis, text classification has always been the focus of research. More research objects are the classification of ordinary texts, such as finance, entertainment, sports and other categories, and there are fewer studies on illegal or politically sensitive articles. In the field of text classification, there are traditional and its learning algorithms, such as SVM, KNN, random forest, etc., as well as the popular neural network classification method in recent years, which uses text feature words to use algorithms to build models and classify texts. Only a probability value can be given to the text, and it can not be judged as a certain type of article based on a certain word. Contents of the invention [0003] The purpo...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.