Text content auditing method and system based on sensitive word

A technology of sensitive words and texts, applied in the Internet field, can solve the problems of difficult maintenance of keyword thesaurus, inability to quickly respond to new words on the Internet, low audit efficiency, etc., to improve audit accuracy, reduce the probability of misjudgment, and quickly The effect of responsiveness

Active Publication Date: 2017-02-22
DATAGRAND TECH INC
View PDF2 Cites 89 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the technical defects that the existing text review technology cannot quickly deal with the generation of these deformed words and new words on the Internet, the review efficiency can only be caused by manual review, and it is easy to cause the keyword lexicon to be difficult to maintain. Sensitive words are processed on the reviewed text, and the weights of sensitive words or sensitive words and their co-occurring keywords in normal text and illegal text are respectively obtained to maintain the sensitive word database to improve the efficiency of text review and reduce text Review false review rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text content auditing method and system based on sensitive word
  • Text content auditing method and system based on sensitive word
  • Text content auditing method and system based on sensitive word

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be described in further detail below through specific embodiments and in conjunction with the accompanying drawings.

[0042] Traditional text review generally uses the number of occurrences of a single keyword to judge whether a webpage violates regulations, and relies on a single keyword to directly filter the text. This review method includes at least the following two convenient technical defects: 1. Some keywords that appear frequently in the illegal text also appear in the normal text, for example, the keyword "breast" contained in the illegal text is Some content related to breast cancer will also appear frequently; however, if there are negative attributives or predicates in the context of sensitive keywords, it is likely to be in a normal text, for example, although a text contains "terrorist organization" Key words, but the words "opposition" and "criticism" appear in front of it, this text is still a normal text.

[0043] In order ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text content auditing method based on a sensitive word. The text content auditing method comprises the following steps of: receiving a text to be audited, carrying out parsing and word segmentation on the text to be audited, and obtaining all keywords in the text to be audited; according to all keywords, inquiring a preset sensitive word database, and obtaining the sensitive word in the text to be audited, wherein the sensitive word database comprises sensitive words and the synonyms or the homoionyms of the sensitive words; obtaining the co-occurrence keyword of the sensitive word in a preset text length, calculating the violation weight of the sensitive word and the co-occurrence keyword of the sensitive word, and judging whether the violation weight is greater than a preset violation threshold value or not; and if the violation weight is greater than the preset violation threshold value, proving that the text to be audited is a violation text, and otherwise, proving that the text to be audited is a normal text. By use of the text content auditing method, a misjudgment probability is effectively lowered, auditing accuracy is improved, and the text content auditing method has quick reaction capacity for anagrams and net neologisms.

Description

technical field [0001] The invention belongs to the technical field of the Internet, and in particular relates to a text content review method and system based on sensitive words. Background technique [0002] With the rapid development of the Internet industry, the information on the Internet has been greatly enriched. Accompanying it will also produce a lot of content that does not conform to the Internet usage environment or even violates national laws and regulations, such as politically sensitive, pornographic terms, etc., resulting in related websites violating the mandatory provisions of national laws and regulations, and there is a risk of safe operation. At the same time, these negative information contents have also greatly damaged the brand value of the website and negatively affected the user experience. [0003] The current text review stores the following three technical difficulties: (1) the single keyword rule is likely to lead to misjudgment; (2) deformed w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/951G06F40/284
Inventor 张健
Owner DATAGRAND TECH INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products