Method for carrying out harmful content recognition on network text and short message service

A network text and mobile phone technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as innocent shielding of normal content, misunderstanding of the author's meaning, and inability to update in time

Inactive Publication Date: 2010-11-03
FUDAN UNIV
View PDF2 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Most of the existing network filtering systems are based on the filtering of URL addresses. This technology simply directly shields the network users from the bad content website URL address database (commonly known as "blacklist") set in the firewall of the network operator. Websites, there are few commercial systems on the market at home and abroad that directly target network text content filtering
Although the URL-based blocking technology is simple and efficient, it has serious limitations: because network operators cannot update the blacklist in time, this will cause many new pornographic websites to slip through the net; at the same time, not all webpages under some domain names The content is unhealthy, which in turn will lead to some normal content being blocked innocently
For example, "Falun Gong" is a reactionary word, but if it is an article against Falun Gong, if it is blocked rashly, it will misunderstand the meaning of the author

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for carrying out harmful content recognition on network text and short message service
  • Method for carrying out harmful content recognition on network text and short message service
  • Method for carrying out harmful content recognition on network text and short message service

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0013] a: Determine the text encoding format. Currently, there are two mainstream encoding formats on the Internet: GBK and UTF8. The two encoding formats are completely different and cannot be mixed with each other. The encoding of GBK has no special format requirements, so it is difficult to identify. But UTF8 has its unique encoding characteristics, so all encodings can be regarded as UTF8 format first, as long as there is a word in the text that does not meet the UTF8 encoding format, it can be regarded as GBK format. If the entire paragraph of text satisfies the UTF8 encoding format, the format is considered to be UTF8 format. If you think it is a waste of time to search the entire text, you can set a threshold K, as long as you find that the consecutive K texts are all in UTF8 format, you can determine that the text is in UTF8 format.

[0014] b: Convert the format of the text. Firstly, removing spaces and tabs on Internet web pages, for example, is equal to detecting...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of text processing, in particular to a method for carrying out harmful content recognition on network text and short message service, which comprises the following steps of: inputting a text to be detected, determining a text coding format, carrying out format conversion on the text, comparing the text with a short string word bank, comparing the text with a long string word bank, carrying out copy detection on a result, and displaying a final result. The method can be used for the detection and the filtration on harmful, violent and reactionary texts in the internet, inhibits the spreading of the harmful content, and protects physical and psychological health of youngsters.

Description

technical field [0001] The invention belongs to the technical field of word processing, and in particular relates to a method for decoding, analyzing and filtering (copy detection) of text content. Background technique [0002] With the increasing use of the Internet, all kinds of bad information (pornography, reactionary, violence, etc.) on the Internet are increasingly interfering with the normal order of the Internet. Due to the lack of effective monitoring of information release by traditional media, a large amount of information that should have been strictly regulated has flooded. How to effectively control the dissemination of these information and ensure the security of network content has become one of the main contents of research on bad text retrieval and monitoring. Most of the existing network filtering systems are based on the filtering of URL addresses. This technology simply directly shields the network users from the bad content website URL address database...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06F17/27
Inventor 邱锡鹏刘力金城张玥杰薛向阳
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products