Error intercepted word screening method and system based on n-gram model

A screening method and error technology, applied in the field of network security, can solve problems such as reduced interception accuracy, low accuracy, and difficulty in mining contextual semantics, and achieve the effect of improving interception accuracy

Active Publication Date: 2022-01-18
北京数美时代科技有限公司
View PDF14 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing interception methods only intercept the words themselves, and it is difficult to mine the semantics of the context, so the accuracy of the interception is low, especially for the data interception of speech-to-text, due to the existence of homonyms, words with similar pronunciation and dialects etc., leading to a further reduction in the accuracy of interception

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Error intercepted word screening method and system based on n-gram model
  • Error intercepted word screening method and system based on n-gram model
  • Error intercepted word screening method and system based on n-gram model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The principles and features of the present invention will be described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

[0021] Such as figure 1 As shown, it is a schematic flow diagram provided for an embodiment of the wrong blocking word screening method of the present invention, the wrong blocking word screening method is realized based on the n-gram model, including:

[0022] S1, obtaining audio translation text data intercepted based on intercepted words under a specific tag;

[0023] It should be noted that the specific label type can be set according to actual business needs. For example, the labels can be simply divided into three categories, which are A-field sensitive labels, B-field sensitive labels, and normal labels. The labels of each category The intercepted words can be set according to actual needs. For example, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an error intercepted word screening method and system based on an n-gram model, and relates to the technical field of network security. The method comprises the steps: acquiring audio translation text data intercepted based on an intercepted word under a specific label; processing the text data through the n-gram model, and screening out data which is not stored in the specific label from the text data as backspacing information; and determining a sentence containing the error intercepted word according to the backspacing information. The method is suitable for interception of forbidden words and sensitive words, especially for interception of the forbidden words and the sensitive words of audio translation text data, can quickly find out mistakenly intercepted sentences and mistakenly intercepted words, and can improve and optimize a forbidden word bank subsequently according to the obtained mistakenly intercepted words; therefore, the interception accuracy of the corresponding intercepted words and the overall interception accuracy are improved.

Description

technical field [0001] The invention relates to the technical field of network security, in particular to an n-gram model-based error blocking word screening method and system. Background technique [0002] The content on the Internet is increasing day by day, and the information often contains illegal and illegal information. Therefore, it is necessary to review and filter these contents to ensure a safe Internet environment and business needs. [0003] At present, the review method is usually to set up prohibited words and user-defined black / white thesaurus to block prohibited words and sensitive words. However, the existing interception methods only intercept the words themselves, and it is difficult to mine the semantics of the context, so the accuracy of the interception is low, especially for the data interception of speech-to-text, due to the existence of homonyms, words with similar pronunciation and dialects etc., leading to a further reduction in the accuracy of i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/06G10L15/22G10L15/26G06F40/211G06F40/216
CPCG10L15/22G10L15/26G10L15/063G06F40/211G06F40/216
Inventor 冉小龙唐会军刘拴林梁堃陈建
Owner 北京数美时代科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products