Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A fine-grained semantic detection method for bad text content on the Internet

A detection method and fine-grained technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as difficult construction, low detection performance, and low performance in text content detection, so as to improve practicability, The effect of improving the detection rate and reducing the false alarm rate

Inactive Publication Date: 2014-10-29
FUDAN UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these two types of text sets are not easy to construct in practical applications, resulting in poor detection performance
[0006]3. Although the detection method using simple semantic analysis technology such as LSA (Latent Semantic Analysis) [1] takes semantic recognition into consideration, there are difficult problems in the establishment of semantic space and the process of semantic extraction, such as the setting of the space dimension. The problem makes the performance not high in the face of flexible and changeable text content detection
Existing methods have deficiencies in vocabulary setting, training text setting, and semantic space construction, and it is still difficult to meet the requirements for detection and filtering of bad semantic text content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A fine-grained semantic detection method for bad text content on the Internet
  • A fine-grained semantic detection method for bad text content on the Internet
  • A fine-grained semantic detection method for bad text content on the Internet

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] 1. The establishment of the semantic topic model of the scene.

[0036] (1) Set the bad information scene to be detected, select sentences related to the scene, and construct a text set describing the scene.

[0037] The text information related to the scene can be sourced from the Internet and extracted through manual reading to construct a text set. The text set consists of a text file where each line is a separate sentence. The sentences chosen should describe as many aspects of the scene as possible.

[0038] (2) Preprocessing of text sets

[0039] Segment each sentence in the text set, and remove some common stop words, so as to obtain a vocabulary T corresponding to the text set. Each row of the vocabulary is a word, and there are no repeated words in the vocabulary.

[0040] (3) Construct word frequency matrix

[0041] For each sentence in the text set S , constructing a row vector v i ={ c i1 , c i2 , c i3 , …, c iX}, i =1,2,…, Y , here X Indic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of text content filtration, and particularly relates to a fine-grained semantic detection method of harmful text contents in network. Aiming at an introduced harmful information scene, the method comprises the steps of: constructing a train text set in which independent sentences are used as basic units, thereby establishing a mathematic description of the scene by using a probability topic model; performing information content extraction to a Web page to be detected; performing sentence identification to the text information; calculating a condition probability of each sentence under the model based on the established probability topic model; and accomplishing the fine-grained semantic detection under the set content detection sensitivity. According to the invention, the model construction is hardly affected by the number of the topics, and probability calculation on the sentence and word level is carried out effectively, so that the method is applicable for various application circumstances requiring harmful text content detection; furthermore fine-grained detection to harmful words and sentences of the text content is supported, so that the method improves the detection rate and reduces the misinformation rate effectively, and is beneficial to improving the practicability of text content filtration.

Description

technical field [0001] The invention belongs to the technical field of text content filtering, and in particular relates to a detection method for network bad text information content. Background technique [0002] At present, the Internet has become a main way and space for creating and sharing information. With the continuous emergence of various online forums and social interactive media, a large amount of text information is generated every day, such as various news reports, product introductions, etc. , various online comments and so on. Among them, a large amount of bad text information content is full of various network spaces. The emergence of pornographic information, violent information, online abuse and other harmful information has caused great harm to the healthy growth of young people, and for office workers, continuous browsing of such information also leads to low work efficiency. Therefore, the detection of bad text information content has become an import...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
Inventor 曾剑平
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products