Multi-language harmful information feature intelligent mining method based on deep learning

A harmful information and deep learning technology, applied in the Internet field, can solve problems such as inconvenient use

Pending Publication Date: 2020-09-04
SINOSOFT
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method requires images in web pages as auxiliary information for identification, but most web texts do not have pictures, and some pictures may have nothing to do with the text, so this method is not easy to use on large-scale text data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-language harmful information feature intelligent mining method based on deep learning
  • Multi-language harmful information feature intelligent mining method based on deep learning
  • Multi-language harmful information feature intelligent mining method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] Below in conjunction with accompanying drawing, the present invention will be further described through embodiment, but the scope of the present invention is not limited in any way.

[0053] The overall flow of the present invention's intelligent mining method for multilingual harmful information features based on deep learning is shown in the attached figure 1 As shown, taking the intelligent mining of harmful information features of Chinese violence and terrorism as an example, it specifically includes:

[0054] 1) Collect harmful information texts and harmless information texts of violent terrorism in various languages ​​including Chinese, and establish a data labeling set , label the positive and negative sample data of harmful information text data, where the positive sample is the harmful information text of this category and language, and the number of samples is N 正样本 , the negative sample is the harmless information text of the category in the language...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-language harmful information feature intelligent mining method based on deep learning. The method comprises the steps of marking harmful and harmless information textsof all languages and all classes; selecting candidate words from each category of words of each language by using an RNSW method and establishing a one-hot code code; inputting the sample data into aCNN neural network model for training to obtain a score that each word belongs to the harmful category of the language, and taking the score as a weight; and screening the harmful information featuresselected by machine learning by using a genetic algorithm to form final harmful information features and weights. The invention provides a language-independent text dimension reduction representationRNSW method, effectively reduces the number of parameters of model training, accelerates the training speed, and improves the accuracy of model recognition; and secondly, the intelligent mining of harmful information features is realized by adopting a deep learning method, and the harmful information features are screened through a genetic algorithm, so that the interpretability of harmful information identification is better.

Description

Technical field: [0001] The present invention relates to text analysis technology in the Internet field, in particular to a harmful text recognition method, which is an intelligent mining method for multilingual harmful information features based on deep learning. Background technique: [0002] There are two commonly used methods for identifying harmful information, one is based on keyword and rule matching, and the other is based on machine learning. Based on the method of matching keywords and rules, it is necessary to manually edit the lexicon of harmful words. Sometimes the rules are complex enough to achieve better results. However, harmful words and new words emerge one after another on the Internet, and the update cycle is short. Maintaining the lexicon and designing new rules It costs a lot of money. The method based on machine learning is gradually adopted in recent years. The advantage of this method is that it does not require technicians to have in-depth domain ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F40/205
CPCG06F18/2111G06F18/214
Inventor 赵全军吴敬征段旭陈宏江伊克拉木·伊力哈木刘立力
Owner SINOSOFT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products